C structs and pointers confusion - c

There are a number of threads on this subject, some of which have been helpful, but I need some specific help.
Say we have this code:
typedef struct A {
int b;
struct other* c;
} A_t;
struct other {
int d;
}
So, we've got a struct called A, containing an int (called b) and a pointer to a struct of type "other", called c.
Now say that later on we have:
A* pA;
which we assume points to a properly allocated instance of A.
My first question is, how do I access "d"? I've tried:
int z = (*pA).(*c).d;
but I get a compile error about expecting an identifier before the '(' token.
My second question is, what's the difference between referring to the data type as A vs. A_t? In other words, we define the struct as type A, but there's the "A_t" part at the end of the definition. I've gotten lots of vague descriptions of what it is, but I don't get it, it seems like we've just randomly decided to call the same data type by two different names, for fun.
Side note, the code above isn't straight from my project, so I can't guarantee it'll produce the same problems I've encountered. It's an assignment and I wanted to avoid posting my actual code, so I just wrote something simpler but similar.
I appreciate it!

Basic rules:
You use prefix-* to dereference a pointer and postfix-. to access a field.
postfix operations are higher precedence than prefix, so you need parenthesis if you want to dereference first
So taking those into account, you start with your pointer pA. You want to dereference it to get the struct:
*pA
Then you want to get the c field, which will require parens around what you already have:
(*pA).c
Now dereference that to get the other object:
*(*pA).c
and then get the d field, which requires parenthesis again:
(*(*pA).c).d
This task of "dereference and then get a field" is so common that there's a short-cut postfix operator -> to do it in one step and avoid the parentheses. First deref pA and get c:
pA->c
then deref again and get d
pA->c->d

what's the difference between referring to the data type as A vs. A_t?
...
decided to call the same data type by two different names, for fun
struct A is aliased to A_t. For "fun" or usually just convenience of having a simpler way to refer to the data type.
This is a conventional way to dereference members:
int z = pA->c->d;
From C99 §6.5.2.3:
The first operand of the -> operator shall have type ''pointer to qualified or unqualified
structure'' or ''pointer to qualified or unqualified union'', and the second operand shall
name a member of the type pointed to.

Related

The difference between "char* variable" and "char *variable" in C [duplicate]

Why do most C programmers name variables like this:
int *myVariable;
rather than like this:
int* myVariable;
Both are valid. It seems to me that the asterisk is a part of the type, not a part of the variable name. Can anyone explain this logic?
They are EXACTLY equivalent.
However, in
int *myVariable, myVariable2;
It seems obvious that myVariable has type int*, while myVariable2 has type int.
In
int* myVariable, myVariable2;
it may seem obvious that both are of type int*, but that is not correct as myVariable2 has type int.
Therefore, the first programming style is more intuitive.
If you look at it another way, *myVariable is of type int, which makes some sense.
Something nobody has mentioned here so far is that this asterisk is actually the "dereference operator" in C.
*a = 10;
The line above doesn't mean I want to assign 10 to a, it means I want to assign 10 to whatever memory location a points to. And I have never seen anyone writing
* a = 10;
have you? So the dereference operator is pretty much always written without a space. This is probably to distinguish it from a multiplication broken across multiple lines:
x = a * b * c * d
* e * f * g;
Here *e would be misleading, wouldn't it?
Okay, now what does the following line actually mean:
int *a;
Most people would say:
It means that a is a pointer to an int value.
This is technically correct, most people like to see/read it that way and that is the way how modern C standards would define it (note that language C itself predates all the ANSI and ISO standards). But it's not the only way to look at it. You can also read this line as follows:
The dereferenced value of a is of type int.
So in fact the asterisk in this declaration can also be seen as a dereference operator, which also explains its placement. And that a is a pointer is not really declared at all, it's implicit by the fact, that the only thing you can actually dereference is a pointer.
The C standard only defines two meanings to the * operator:
indirection operator
multiplication operator
And indirection is just a single meaning, there is no extra meaning for declaring a pointer, there is just indirection, which is what the dereference operation does, it performs an indirect access, so also within a statement like int *a; this is an indirect access (* means indirect access) and thus the second statement above is much closer to the standard than the first one is.
Because the * in that line binds more closely to the variable than to the type:
int* varA, varB; // This is misleading
As #Lundin points out below, const adds even more subtleties to think about. You can entirely sidestep this by declaring one variable per line, which is never ambiguous:
int* varA;
int varB;
The balance between clear code and concise code is hard to strike — a dozen redundant lines of int a; isn't good either. Still, I default to one declaration per line and worry about combining code later.
I'm going to go out on a limb here and say that there is a straight answer to this question, both for variable declarations and for parameter and return types, which is that the asterisk should go next to the name: int *myVariable;. To appreciate why, look at how you declare other types of symbol in C:
int my_function(int arg); for a function;
float my_array[3] for an array.
The general pattern, referred to as declaration follows use, is that the type of a symbol is split up into the part before the name, and the parts around the name, and these parts around the name mimic the syntax you would use to get a value of the type on the left:
int a_return_value = my_function(729);
float an_element = my_array[2];
and: int copy_of_value = *myVariable;.
C++ throws a spanner in the works with references, because the syntax at the point where you use references is identical to that of value types, so you could argue that C++ takes a different approach to C. On the other hand, C++ retains the same behaviour of C in the case of pointers, so references really stand as the odd one out in this respect.
A great guru once said "Read it the way of the compiler, you must."
http://www.drdobbs.com/conversationsa-midsummer-nights-madness/184403835
Granted this was on the topic of const placement, but the same rule applies here.
The compiler reads it as:
int (*a);
not as:
(int*) a;
If you get into the habit of placing the star next to the variable, it will make your declarations easier to read. It also avoids eyesores such as:
int* a[10];
-- Edit --
To explain exactly what I mean when I say it's parsed as int (*a), that means that * binds more tightly to a than it does to int, in very much the manner that in the expression 4 + 3 * 7 3 binds more tightly to 7 than it does to 4 due to the higher precedence of *.
With apologies for the ascii art, a synopsis of the A.S.T. for parsing int *a looks roughly like this:
Declaration
/ \
/ \
Declaration- Init-
Secifiers Declarator-
| List
| |
| ...
"int" |
Declarator
/ \
/ ...
Pointer \
| Identifier
| |
"*" |
"a"
As is clearly shown, * binds more tightly to a since their common ancestor is Declarator, while you need to go all the way up the tree to Declaration to find a common ancestor that involves the int.
That's just a matter of preference.
When you read the code, distinguishing between variables and pointers is easier in the second case, but it may lead to confusion when you are putting both variables and pointers of a common type in a single line (which itself is often discouraged by project guidelines, because decreases readability).
I prefer to declare pointers with their corresponding sign next to type name, e.g.
int* pMyPointer;
People who prefer int* x; are trying to force their code into a fictional world where the type is on the left and the identifier (name) is on the right.
I say "fictional" because:
In C and C++, in the general case, the declared identifier is surrounded by the type information.
That may sound crazy, but you know it to be true. Here are some examples:
int main(int argc, char *argv[]) means "main is a function that takes an int and an array of pointers to char and returns an int." In other words, most of the type information is on the right. Some people think function declarations don't count because they're somehow "special." OK, let's try a variable.
void (*fn)(int) means fn is a pointer to a function that takes an int and returns nothing.
int a[10] declares 'a' as an array of 10 ints.
pixel bitmap[height][width].
Clearly, I've cherry-picked examples that have a lot of type info on the right to make my point. There are lots of declarations where most--if not all--of the type is on the left, like struct { int x; int y; } center.
This declaration syntax grew out of K&R's desire to have declarations reflect the usage. Reading simple declarations is intuitive, and reading more complex ones can be mastered by learning the right-left-right rule (sometimes call the spiral rule or just the right-left rule).
C is simple enough that many C programmers embrace this style and write simple declarations as int *p.
In C++, the syntax got a little more complex (with classes, references, templates, enum classes), and, as a reaction to that complexity, you'll see more effort into separating the type from the identifier in many declarations. In other words, you might see see more of int* p-style declarations if you check out a large swath of C++ code.
In either language, you can always have the type on the left side of variable declarations by (1) never declaring multiple variables in the same statement, and (2) making use of typedefs (or alias declarations, which, ironically, put the alias identifiers to the left of types). For example:
typedef int array_of_10_ints[10];
array_of_10_ints a;
A lot of the arguments in this topic are plain subjective and the argument about "the star binds to the variable name" is naive. Here's a few arguments that aren't just opinions:
The forgotten pointer type qualifiers
Formally, the "star" neither belongs to the type nor to the variable name, it is part of its own grammatical item named pointer. The formal C syntax (ISO 9899:2018) is:
(6.7) declaration:
declaration-specifiers init-declarator-listopt ;
Where declaration-specifiers contains the type (and storage), and the init-declarator-list contains the pointer and the variable name. Which we see if we dissect this declarator list syntax further:
(6.7.6) declarator:
pointeropt direct-declarator
...
(6.7.6) pointer:
* type-qualifier-listopt
* type-qualifier-listopt pointer
Where a declarator is the whole declaration, a direct-declarator is the identifier (variable name), and a pointer is the star followed by an optional type qualifier list belonging to the pointer itself.
What makes the various style arguments about "the star belongs to the variable" inconsistent, is that they have forgotten about these pointer type qualifiers. int* const x, int *const x or int*const x?
Consider int *const a, b;, what are the types of a and b? Not so obvious that "the star belongs to the variable" any longer. Rather, one would start to ponder where the const belongs to.
You can definitely make a sound argument that the star belongs to the pointer type qualifier, but not much beyond that.
The type qualifier list for the pointer can cause problems for those using the int *a style. Those who use pointers inside a typedef (which we shouldn't, very bad practice!) and think "the star belongs to the variable name" tend to write this very subtle bug:
/*** bad code, don't do this ***/
typedef int *bad_idea_t;
...
void func (const bad_idea_t *foo);
This compiles cleanly. Now you might think the code is made const correct. Not so! This code is accidentally a faked const correctness.
The type of foo is actually int*const* - the outer most pointer was made read-only, not the pointed at data. So inside this function we can do **foo = n; and it will change the variable value in the caller.
This is because in the expression const bad_idea_t *foo, the * does not belong to the variable name here! In pseudo code, this parameter declaration is to be read as const (bad_idea_t *) foo and not as (const bad_idea_t) *foo. The star belongs to the hidden pointer type in this case - the type is a pointer and a const-qualified pointer is written as *const.
But then the root of the problem in the above example is the practice of hiding pointers behind a typedef and not the * style.
Regarding declaration of multiple variables on a single line
Declaring multiple variables on a single line is widely recognized as bad practice1). CERT-C sums it up nicely as:
DCL04-C. Do not declare more than one variable per declaration
Just reading the English, then common sense agrees that a declaration should be one declaration.
And it doesn't matter if the variables are pointers or not. Declaring each variable on a single line makes the code clearer in almost every case.
So the argument about the programmer getting confused over int* a, b is bad. The root of the problem is the use of multiple declarators, not the placement of the *. Regardless of style, you should be writing this instead:
int* a; // or int *a
int b;
Another sound but subjective argument would be that given int* a the type of a is without question int* and so the star belongs with the type qualifier.
But basically my conclusion is that many of the arguments posted here are just subjective and naive. You can't really make a valid argument for either style - it is truly a matter of subjective personal preference.
1) CERT-C DCL04-C.
Because it makes more sense when you have declarations like:
int *a, *b;
For declaring multiple pointers in one line, I prefer int* a, * b; which more intuitively declares "a" as a pointer to an integer, and doesn't mix styles when likewise declaring "b." Like someone said, I wouldn't declare two different types in the same statement anyway.
When you initialize and assign a variable in one statement, e.g.
int *a = xyz;
you assign the value of xyz to a, not to *a. This makes
int* a = xyz;
a more consistent notation.

What's the difference between (int *pointer) and (int* pointer) [duplicate]

Why do most C programmers name variables like this:
int *myVariable;
rather than like this:
int* myVariable;
Both are valid. It seems to me that the asterisk is a part of the type, not a part of the variable name. Can anyone explain this logic?
They are EXACTLY equivalent.
However, in
int *myVariable, myVariable2;
It seems obvious that myVariable has type int*, while myVariable2 has type int.
In
int* myVariable, myVariable2;
it may seem obvious that both are of type int*, but that is not correct as myVariable2 has type int.
Therefore, the first programming style is more intuitive.
If you look at it another way, *myVariable is of type int, which makes some sense.
Something nobody has mentioned here so far is that this asterisk is actually the "dereference operator" in C.
*a = 10;
The line above doesn't mean I want to assign 10 to a, it means I want to assign 10 to whatever memory location a points to. And I have never seen anyone writing
* a = 10;
have you? So the dereference operator is pretty much always written without a space. This is probably to distinguish it from a multiplication broken across multiple lines:
x = a * b * c * d
* e * f * g;
Here *e would be misleading, wouldn't it?
Okay, now what does the following line actually mean:
int *a;
Most people would say:
It means that a is a pointer to an int value.
This is technically correct, most people like to see/read it that way and that is the way how modern C standards would define it (note that language C itself predates all the ANSI and ISO standards). But it's not the only way to look at it. You can also read this line as follows:
The dereferenced value of a is of type int.
So in fact the asterisk in this declaration can also be seen as a dereference operator, which also explains its placement. And that a is a pointer is not really declared at all, it's implicit by the fact, that the only thing you can actually dereference is a pointer.
The C standard only defines two meanings to the * operator:
indirection operator
multiplication operator
And indirection is just a single meaning, there is no extra meaning for declaring a pointer, there is just indirection, which is what the dereference operation does, it performs an indirect access, so also within a statement like int *a; this is an indirect access (* means indirect access) and thus the second statement above is much closer to the standard than the first one is.
Because the * in that line binds more closely to the variable than to the type:
int* varA, varB; // This is misleading
As #Lundin points out below, const adds even more subtleties to think about. You can entirely sidestep this by declaring one variable per line, which is never ambiguous:
int* varA;
int varB;
The balance between clear code and concise code is hard to strike — a dozen redundant lines of int a; isn't good either. Still, I default to one declaration per line and worry about combining code later.
I'm going to go out on a limb here and say that there is a straight answer to this question, both for variable declarations and for parameter and return types, which is that the asterisk should go next to the name: int *myVariable;. To appreciate why, look at how you declare other types of symbol in C:
int my_function(int arg); for a function;
float my_array[3] for an array.
The general pattern, referred to as declaration follows use, is that the type of a symbol is split up into the part before the name, and the parts around the name, and these parts around the name mimic the syntax you would use to get a value of the type on the left:
int a_return_value = my_function(729);
float an_element = my_array[2];
and: int copy_of_value = *myVariable;.
C++ throws a spanner in the works with references, because the syntax at the point where you use references is identical to that of value types, so you could argue that C++ takes a different approach to C. On the other hand, C++ retains the same behaviour of C in the case of pointers, so references really stand as the odd one out in this respect.
A great guru once said "Read it the way of the compiler, you must."
http://www.drdobbs.com/conversationsa-midsummer-nights-madness/184403835
Granted this was on the topic of const placement, but the same rule applies here.
The compiler reads it as:
int (*a);
not as:
(int*) a;
If you get into the habit of placing the star next to the variable, it will make your declarations easier to read. It also avoids eyesores such as:
int* a[10];
-- Edit --
To explain exactly what I mean when I say it's parsed as int (*a), that means that * binds more tightly to a than it does to int, in very much the manner that in the expression 4 + 3 * 7 3 binds more tightly to 7 than it does to 4 due to the higher precedence of *.
With apologies for the ascii art, a synopsis of the A.S.T. for parsing int *a looks roughly like this:
Declaration
/ \
/ \
Declaration- Init-
Secifiers Declarator-
| List
| |
| ...
"int" |
Declarator
/ \
/ ...
Pointer \
| Identifier
| |
"*" |
"a"
As is clearly shown, * binds more tightly to a since their common ancestor is Declarator, while you need to go all the way up the tree to Declaration to find a common ancestor that involves the int.
That's just a matter of preference.
When you read the code, distinguishing between variables and pointers is easier in the second case, but it may lead to confusion when you are putting both variables and pointers of a common type in a single line (which itself is often discouraged by project guidelines, because decreases readability).
I prefer to declare pointers with their corresponding sign next to type name, e.g.
int* pMyPointer;
People who prefer int* x; are trying to force their code into a fictional world where the type is on the left and the identifier (name) is on the right.
I say "fictional" because:
In C and C++, in the general case, the declared identifier is surrounded by the type information.
That may sound crazy, but you know it to be true. Here are some examples:
int main(int argc, char *argv[]) means "main is a function that takes an int and an array of pointers to char and returns an int." In other words, most of the type information is on the right. Some people think function declarations don't count because they're somehow "special." OK, let's try a variable.
void (*fn)(int) means fn is a pointer to a function that takes an int and returns nothing.
int a[10] declares 'a' as an array of 10 ints.
pixel bitmap[height][width].
Clearly, I've cherry-picked examples that have a lot of type info on the right to make my point. There are lots of declarations where most--if not all--of the type is on the left, like struct { int x; int y; } center.
This declaration syntax grew out of K&R's desire to have declarations reflect the usage. Reading simple declarations is intuitive, and reading more complex ones can be mastered by learning the right-left-right rule (sometimes call the spiral rule or just the right-left rule).
C is simple enough that many C programmers embrace this style and write simple declarations as int *p.
In C++, the syntax got a little more complex (with classes, references, templates, enum classes), and, as a reaction to that complexity, you'll see more effort into separating the type from the identifier in many declarations. In other words, you might see see more of int* p-style declarations if you check out a large swath of C++ code.
In either language, you can always have the type on the left side of variable declarations by (1) never declaring multiple variables in the same statement, and (2) making use of typedefs (or alias declarations, which, ironically, put the alias identifiers to the left of types). For example:
typedef int array_of_10_ints[10];
array_of_10_ints a;
A lot of the arguments in this topic are plain subjective and the argument about "the star binds to the variable name" is naive. Here's a few arguments that aren't just opinions:
The forgotten pointer type qualifiers
Formally, the "star" neither belongs to the type nor to the variable name, it is part of its own grammatical item named pointer. The formal C syntax (ISO 9899:2018) is:
(6.7) declaration:
declaration-specifiers init-declarator-listopt ;
Where declaration-specifiers contains the type (and storage), and the init-declarator-list contains the pointer and the variable name. Which we see if we dissect this declarator list syntax further:
(6.7.6) declarator:
pointeropt direct-declarator
...
(6.7.6) pointer:
* type-qualifier-listopt
* type-qualifier-listopt pointer
Where a declarator is the whole declaration, a direct-declarator is the identifier (variable name), and a pointer is the star followed by an optional type qualifier list belonging to the pointer itself.
What makes the various style arguments about "the star belongs to the variable" inconsistent, is that they have forgotten about these pointer type qualifiers. int* const x, int *const x or int*const x?
Consider int *const a, b;, what are the types of a and b? Not so obvious that "the star belongs to the variable" any longer. Rather, one would start to ponder where the const belongs to.
You can definitely make a sound argument that the star belongs to the pointer type qualifier, but not much beyond that.
The type qualifier list for the pointer can cause problems for those using the int *a style. Those who use pointers inside a typedef (which we shouldn't, very bad practice!) and think "the star belongs to the variable name" tend to write this very subtle bug:
/*** bad code, don't do this ***/
typedef int *bad_idea_t;
...
void func (const bad_idea_t *foo);
This compiles cleanly. Now you might think the code is made const correct. Not so! This code is accidentally a faked const correctness.
The type of foo is actually int*const* - the outer most pointer was made read-only, not the pointed at data. So inside this function we can do **foo = n; and it will change the variable value in the caller.
This is because in the expression const bad_idea_t *foo, the * does not belong to the variable name here! In pseudo code, this parameter declaration is to be read as const (bad_idea_t *) foo and not as (const bad_idea_t) *foo. The star belongs to the hidden pointer type in this case - the type is a pointer and a const-qualified pointer is written as *const.
But then the root of the problem in the above example is the practice of hiding pointers behind a typedef and not the * style.
Regarding declaration of multiple variables on a single line
Declaring multiple variables on a single line is widely recognized as bad practice1). CERT-C sums it up nicely as:
DCL04-C. Do not declare more than one variable per declaration
Just reading the English, then common sense agrees that a declaration should be one declaration.
And it doesn't matter if the variables are pointers or not. Declaring each variable on a single line makes the code clearer in almost every case.
So the argument about the programmer getting confused over int* a, b is bad. The root of the problem is the use of multiple declarators, not the placement of the *. Regardless of style, you should be writing this instead:
int* a; // or int *a
int b;
Another sound but subjective argument would be that given int* a the type of a is without question int* and so the star belongs with the type qualifier.
But basically my conclusion is that many of the arguments posted here are just subjective and naive. You can't really make a valid argument for either style - it is truly a matter of subjective personal preference.
1) CERT-C DCL04-C.
Because it makes more sense when you have declarations like:
int *a, *b;
For declaring multiple pointers in one line, I prefer int* a, * b; which more intuitively declares "a" as a pointer to an integer, and doesn't mix styles when likewise declaring "b." Like someone said, I wouldn't declare two different types in the same statement anyway.
When you initialize and assign a variable in one statement, e.g.
int *a = xyz;
you assign the value of xyz to a, not to *a. This makes
int* a = xyz;
a more consistent notation.

What exactly does the C Structure Dot Operator Do (Lower Level Perspective)?

I have a question regarding structs in C. So when you create a struct, you are essentially defining the framework of a block of memory. Thus when you create an instance of a struct, you are creating a block of memory such that it is capable of holding a certain number of elements.
However, I'm somewhat confused on what the dot operator is doing. If I have a struct Car and have a member called GasMileage (which is an int member), I am able to get the value of GasMileage by doing something like,
int x = CarInstance.GasMileage;
However, I'm confused as to what is actually happening with this dot operator. Does the dot operator simply act as an offset from the base address? And how exactly is it able to deduce that it is an int?
I guess I'm curious as to what is going on behind the scenes. Would it be possible to reference GasMileage by doing something else? Such as
int *GasMileagePointer = (&carInstance + offsetInBytes(GasMileage));
int x = *GasMileage
This is just something i quickly made up. I've tried hard searching for an good explanation, but nothing seems to explain it any further than treating the dot operator as magic.
When you use the . operator, the compiler translates this to an offset inside the struct, based on the size of the fields (and padding) that precede it.
For example:
struct Car {
char model[52];
int doors;
int GasMilage;
};
Assuming an int is 4 bytes and no padding, the offset of model is 0, the offset of doors is 52, and the offset of GasMilage is 56.
So if you know the offset of the member, you could get a pointer to it like this:
int *GasMileagePointer = (int*)((char *)&carInstance + offsetInBytes(GasMile));
The cast to char * is necessary so that pointer arithmetic goes 1 byte at a time instead of 1 sizeof(carInstance) at a time. Then the result needs to be casted to the correct pointer type, in this case int *
Yes, the dot operator simply applies an offset from the base of the structure, and then accesses the value at that address.
int x = CarInstance.GasMileage;
is equivalent to:
int x = *(int *)((char*)&CarInstance + offsetof(Car, GasMileage));
For a member with some other type T, the only difference is that the cast (int *) becomes (T *).
The dot operator simply selects the member.
Since the compiler has information about the type (and consequently size) of the member (all members, actually), it knows the offset of the member from the start of the struct and can generate appropriate instructions. It may generate a base+offset access, but it also may access the member directly (or even have it cached in a register). The compiler has all those options since it has all the necessary information at compile time.
If it hasn't, like for incomplete types, you'll get a compile-time error.
When it works, the "." behavior of the "." operator is equivalent to taking the address of the structure, indexing it by the offset of the member, and converting that to a pointer of the member type, and dereferencing it. The Standard, however, provides that there are situations where that isn't guaranteed to work. For example, given:
struct s1 {int x,y; }
struct s2 {int x,y; }
void test1(struct s1 *p1, struct s2 *p2)
{
s1->x++;
s2->x^=1;
s1->x--;
s2->x^=1;
}
a compiler may decide that there's no legitimate way that p1->x and p2->x
can identify the same object, so it may reorder the code so as to the ++
and -- operations on s1->x cancel, and the ^=1 operations on s2->x cancel,
thus leaving a function that does nothing.
Note that the behavior is different when using unions, since given:
union u { struct s1 v1; struct s2 v2; };
void test2(union u *uv)
{
u->v1.x^=1;
u->v2.x++;
u->v1.x^=1;
u->v2.x--;
}
the common-initial-subsequence rule indicates that since u->v1 and u->v2
start with fields of the same types, an access to such a field in u->v1 is
equivalent to an access to the corresponding field in u->v2. Thus, a
compiler is not allowed to resequence things. On the other hand, given
void test1(struct s1 *p1, struct s2 *p2);
void test3(union u *uv)
{
test1(&(u.v1), &(u.v2));
}
the fact that u.v1 and u.v2 start with matching fields doesn't guard against
a compiler's assumption that the pointers won't alias.
Note that some compilers offer an option to force generation of code where
member accesses always behave equivalent to the aforementioned pointer
operations. For gcc, the option is -fno-strict-alias. If code will need
to access common initial members of varying structure types, omitting that
switch may cause one's code to fail in weird, bizarre, and unpredictable
ways.

C, Struct pointer polymorphism

NOTE: this is NOT a C++ question, i can't use a C++ compiler, only a C99.
Is this valid(and acceptable, beautiful) code?
typedef struct sA{
int a;
} A;
typedef struct aB{
struct sA a;
int b;
} B;
A aaa;
B bbb;
void init(){
bbb.b=10;
bbb.a.a=20;
set((A*)&bbb);
}
void set(A* a){
aaa=*a;
}
void useLikeB(){
printf("B.b = %d", ((B*)&aaa)->b);
}
In short, is valid to cast a "sub class" to "super class" and after recast "super class" to "sub class" when i need specified behavior of it?
Thanks
First of all, the C99 standard permits you to cast any struct pointer to a pointer to its first member, and the other way (6.7.2.1 Structure and union specifiers):
13 Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
In other way, in your code you are free to:
Convert B* to A* — and it will always work correctly,
Convert A* to B* — but if it doesn't actually point to B, you're going to get random failures accessing further members,
Assign the structure pointed through A* to A — but if the pointer was converted from B*, only the common members will be assigned and the remaining members of B will be ignored,
Assign the structure pointed through B* to A — but you have to convert the pointer first, and note (3).
So, your example is almost correct. But useLikeB() won't work correctly since aaa is a struct of type A which you assigned like stated in point (4). This has two results:
The non-common B members won't be actually copied to aaa (as stated in (3)),
Your program will fail randomly trying to access A like B which it isn't (you're accessing a member which is not there, as stated in (2)).
To explain that in a more practical way, when you declare A compiler reserves the amount of memory necessary to hold all members of A. B has more members, and thus requires more memory. As A is a regular variable, it can't change its size during run-time and thus can't hold the remaining members of B.
And as a note, by (1) you can practically take a pointer to the member instead of converting the pointer which is nicer, and it will allow you to access any member, not only the first one. But note that in this case, the opposite won't work anymore!
I think this is quite dirty and relatively hazardous. What are you trying to achieve with this? also there is no guarantee that aaa is a B , it might also be an A. so when someone calls "uselikeB" it might fail. Also depending on architecture "int a" and "pointer to struct a" might either overlap correctly or not and might result in interesting stuff happening when you assign to "int a" and then access "struct a"
Why would you do this? Having
set((A*)&bbb);
is not easier to write than the correct
set(&bbb.a);
Other things that you should please avoid when you post here:
you use set before it is declared
aaa=a should be aaa = *a
First of all, I agree with most concerns from previous posters about the safety of this assignments.
With that said, if you need to go that route, I'd add one level of indirection and some type-safety checkers.
static const int struct_a_id = 1;
static const int struct_b_id = 2;
struct MyStructPtr {
int type;
union {
A* ptra;
B* ptrb;
//continue if you have more types.
}
};
The idea is that you manage your pointers by passing them through a struct that contains some "type" information. You can build a tree of classes on the side that describe your class tree (note that given the restrictions for safely casting, this CAN be represented using a tree) and be able to answer questions to ensure you are correctly casting structures up and down. So your "useLikeB" function could be written like this.
MyStructPtr the_ptr;
void init_ptr(A* pa)
{
the_ptr.type = struct_a_id
the_ptr.ptra = pa;
}
void useLikeB(){
//This function should FAIL IF aaa CANT BE SAFELY CASTED TO B
//by checking in your type tree that the a type is below the
//a type (not necesarily a direct children).
assert( is_castable_to(the_ptr.type,struct_b_id ) );
printf("B.b = %d", the_ptr.ptrb->b);
}
My 2 cents.

How do you read C declarations?

I have heard of some methods, but none of them have stuck. Personally I try to avoid complex types in C and try to break them into component typedef.
I'm now faced with maintaining some legacy code from a so called 'three star programmer', and I'm having a hard time reading some of the ***code[][].
How do you read complex C declarations?
This article explains a relatively simple 7 rules which will let you read any C declaration, if you find yourself wanting or needing to do so manually: http://www.ericgiguere.com/articles/reading-c-declarations.html
Find the identifier. This is your starting point. On a piece of paper, write "declare identifier as".
Look to the right. If there is nothing there, or there is a right parenthesis ")", goto step 4.
You are now positioned either on an array (left bracket) or function (left parenthesis) descriptor. There may be a sequence of these, ending either with an unmatched right parenthesis or the end of the declarator (a semicolon or a "=" for initialization). For each such descriptor, reading from left to right:
if an empty array "[]", write "array of"
if an array with a size, write "array size of"
if a function "()", write "function returning"
Stop at the unmatched parenthesis or the end of the declarator, whichever comes first.
Return to the starting position and look to the left. If there is nothing there, or there is a left parenthesis "(", goto step 6.
You are now positioned on a pointer descriptor, "*". There may be a sequence of these to the left, ending either with an unmatched left parenthesis "(" or the start of the declarator. Reading from right to left, for each pointer descriptor write "pointer to". Stop at the unmatched parenthesis or the start of the declarator, whichever is first.
At this point you have either a parenthesized expression or the complete declarator. If you have a parenthesized expression, consider it as your new starting point and return to step 2.
Write down the type specifier. Stop.
If you're fine with a tool, then I second the suggestion to use the program cdecl: http://gd.tuwien.ac.at/linuxcommand.org/man_pages/cdecl1.html
I generally use what is sometimes called the 'right hand clockwise rule'.
It goes like this:
Start from the identifier.
Go to the immediate right of it.
Then move clockwise and come to the left hand side.
Move clockwise and come to the right side.
Do this as long as the declaration has not been parsed fully.
There's an additional meta-rule that has to be taken care of:
If there are parentheses, complete each level of parentheses before moving out.
Here, 'going' and 'moving' somewhere means reading the symbol there. The rules for that are:
* - pointer to
() - function returning
(int, int) - function taking two ints and returning
int, char, etc. - int, char, etc.
[] - array of
[10] - array of ten
etc.
So, for example, int* (*xyz[10])(int*, char) is read as:
xyz is an
array of ten
pointer to
function taking an int* and a char and returning
an int*
One word: cdecl
Damnit, beaten by 15 seconds!
Cdecl (and c++decl) is a program for encoding and decoding C (or C++) type declarations.
http://gd.tuwien.ac.at/linuxcommand.org/man_pages/cdecl1.html
Back when I was doing C, I made use of a program called "cdecl". It appears that it's in Ubuntu Linux in the cutils or cdecl package, and it's probably available elsewhere.
cdecl offers a command line interface so let's give it a try:
cdecl> explain int ***c[][]
declare c as array of array of pointer to pointer to pointer to int
another example
explain int (*IMP)(ID,SEL)
declare IMP as pointer to function (ID, SEL) returning int
However there is a whole chapter about that in the book "C Deep Secrets", named "Unscrambling declarations in C.
Just came across an illuminating section in "The Development of the C Language":
For each object of such a composed type, there was already a way to mention the underlying object: index the array, call the function, use the indirection operator on the pointer. Analogical reasoning led to a declaration syntax for names mirroring that of the expression syntax in which the names typically appear. Thus,
int i, *pi, **ppi;
declare an integer, a pointer to an integer, a pointer to a pointer to an integer. The syntax of these declarations reflects the observation that i, *pi, and **ppi all yield an int type when used in an expression. Similarly,
int f(), *f(), (*f)();
declare a function returning an integer, a function returning a pointer to an integer, a pointer to a function returning an integer;
int *api[10], (*pai)[10];
declare an array of pointers to integers, and a pointer to an array of integers. In all these cases the declaration of a variable resembles its usage in an expression whose type is the one named at the head of the declaration.
There's also a Web-based version of cdecl which is pretty slick.
Common readability problems include function pointers and the fact that arrays are really pointers, and that multidimensional arrays are really single dimension arrays (which are really pointers). Hope that helps some.
In any case, whenever you do understand the declarations, maybe you can figure out a way to simplify them to make them more readable for the next guy.
Automated solution is cdecl.
In general, you declare a variable the way you use it. For example, you dereference a pointer p as in:
char c = * p
you declare it in a similar looking way:
char * p;
Same goes for hairy function pointers. Let's declare f to be good old "pointer to function returning pointer to int," and an external declaration just to be funny. It's a pointer to a function, so we start with:
extern * f();
It returns a pointer to an int, so somewhere in front there there's
extern int * * f(); // XXX not quite yet
Now which is the correct associativity? I can never remember, so use some parenthesis.
extern (int *)(* f)();
Declare it the way you use it.

Resources