The C 11 standard defines struct compatibility as follows (6.2.7):
Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: If one is declared with a tag, the other shall be declared with the same tag. If both are completed anywhere within their respective translation units, then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types…
That means I can have 2 files like this:
foo.c:
struct struc {
int x;
};
int foo(struct struc *s)
{
return s->x;
}
main.c:
struct struc {
float x;
};
int foo(struct struc *s);
int main(void)
{
return foo(&(struct struc){1.2f});
}
Smells like undefined behavior (as it is for types like int and float). But if I am understanding the standard correctly (maybe I am misinterpreting the second sentence), this is allowed. If so, what is the rationale behind this? Why not also specify that structs in separate translation units must also be structurally equivalent?
Smells like undefined behavior
Because it is.
But if I am understanding the standard correctly
This doesn't seem to be the case in this particular instance.
this is allowed.
Nope. I do not see (and you do not explain) how the standard language could be interpreted this way.
The standard says
If both are completed anywhere within their respective translation units
This condition holds in your your example.
then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types
This requirement is not satisfied, so the types are not compatible.
Why not also specify that structs in separate translation units must also be structurally equivalent?
The standard specifies exactly that. "[o]ne-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types" is precisely the definition of structural equivalence.
Related
From what I understand, the main reason people separate function declarations and definitions is so that the functions can be used in multiple compilation units. So then I was wondering, what's the point of violating DRY this way, if structures don't have prototypes and would still cause ODR problems across compilation units? I decided to try and define a structure twice using a header across two compilation units, and then combining them, but the code compiled without any errors.
Here is what I did:
main.c:
#include "test.h"
int main() {
return 0;
}
a.c:
#include "test.h"
test.h:
#ifndef TEST_INCLUDED
#define TEST_INCLUDED
struct test {
int a;
};
#endif
Then I ran the following gcc commands.
gcc -c a.c
gcc -c main.c
gcc -o final a.o main.o
Why does the above work and not give an error?
C's one definition rule (C17 6.9p5) applies to the definition of a function or an object (i.e. a variable). struct test { int a; }; does not define any object; rather, it declares the identifier test as a tag of the corresponding struct type (6.7.2.3 p7). This declaration is local to the current translation unit (i.e. source file) and it is perfectly fine to have it in several translation units. For that matter, you can even declare the same identifier as a tag for different types in different source files, or in different scopes, so that struct test is an entirely different type in one file / function / block than another. It would probably be confusing, but legal.
If you actually defined an object in test.h, e.g. struct test my_test = { 42 };, then you would be violating the one definition rule, and the behavior of your program would be undefined. (But that does not necessarily mean you will get an error message; multiple definitions are handled in various different ways by different implementations.)
The key section in the standard is nearly indigestible, but §6.2.7 Compatible type and composite type covers the details, with some forward references:
¶1 Two types have compatible type if their types are the same. Additional rules for determining whether two types are compatible are described in 6.7.2 for type specifiers, in 6.7.3 for type qualifiers, and in 6.7.6 for declarators.55) Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: If one is declared with a tag, the other shall be declared with the same tag. If both are completed anywhere within their respective translation units, then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types; if one member of the pair is declared with an alignment specifier, the other is declared with an equivalent alignment specifier; and if one member of the pair is declared with a name, the other is declared with the same name. For two structures, corresponding members shall be declared in the same order. For two structures or unions, corresponding bit-fields shall have the same widths. For two enumerations, corresponding members shall have the same values.
¶2 All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.
¶3 A composite type can be constructed from two types that are compatible; it is a type that is compatible with both of the two types and satisfies the following conditions:
If both types are array types, the following rules are applied:
If one type is an array of known constant size, the composite type is an array of that size.
Otherwise, if one type is a variable length array whose size is specified by an expression that is not evaluated, the behavior is undefined.
Otherwise, if one type is a variable length array whose size is specified, the composite type is a variable length array of that size.
Otherwise, if one type is a variable length array of unspecified size, the composite type is a variable length array of unspecified size.
Otherwise, both types are arrays of unknown size and the composite type is an array of unknown size.
The element type of the composite type is the composite type of the two element types.
If only one type is a function type with a parameter type list (a function prototype), the composite type is a function prototype with the parameter type list.
If both types are function types with parameter type lists, the type of each parameter in the composite parameter type list is the composite type of the corresponding parameters.
These rules apply recursively to the types from which the two types are derived.
¶4 For an identifier with internal or external linkage declared in a scope in which a prior declaration of that identifier is visible,56) if the prior declaration specifies internal or external linkage, the type of the identifier at the later declaration becomes the composite type.
55) Two types need not be identical to be compatible.
56) As specified in 6.2.1, the later declaration might hide the prior declaration.
Emphasis added
The second part of ¶1 covers explicitly the case of structures, unions and enumerations declared in separate translation units. It is crucial to allowing separate compilation. Note footnote 55 too. However, if you use the same header to define a given structure (union, enumeration) in separate translation units, the chances of you not using a compatible type are small. It can be done if there is conditional compilation and the conditions are different in the two translation units, but you usually have to be trying quite hard to run into problems.
From cppreference:
1) Label name space: all identifiers declared as labels.
2) Tag names: all identifiers declared as names of structs, unions and enumerated types.
3) Member names: all identifiers declared as members of any one struct or union. Every struct and union introduces its own name space of this kind.
4) All other identifiers, called ordinary identifiers to distinguish from (1-3) (function names, object names, typedef names, enumeration constants).
This allows for code like this (among other things):
struct Point { int x, y; };
struct Point Point;
This code seems somewhat unclear to me as Point can refer to both a type and an instance of a struct. What was the motivation behind having separate name spaces for tags and other identifiers?
The actual question posed is
What was the motivation behind having separate name spaces for tags and other identifiers?
This can be answered only by reference to the standard committee's rationale document, which in fact does address the matter, however briefly:
Pre-C89 implementations varied considerably in the number of separate name spaces maintained. The position adopted in the Standard is to permit as many separate name spaces as can be distinguished by context, except that all tags (struct, union, and enum) comprise a single name space.
(C99 rationale document,* section 6.2.3)
Thus, it is explicitly intentional that code such as
struct point { int point; } point = { .point = 0 };
goto point;
point:
return point.point;
is permitted. My interpretation of the rationale is that the intention was to be unrestrictive, though it remains unclear why the different kinds of tags were not given separate namespaces. This could not have been accidental, so one or more parties represented on the committee must have opposed separate tag namespaces, and they managed to prevail. Such opposition could very well have been for business instead of technical reasons.
*As far as I am aware, there is no rationale document for the C2011 standard. At least, not yet.
I'm designing an application and came across an implementation issue. I have the following struct definition:
app.h:
struct application_t{
void (*run_application)(struct application_t*);
void (*stop_application)(struct application_t*);
}
struct application_t* create();
The problem came when I tried to "implement" this application_t. I tend to define another struct:
app.c:
struct tcp_application_impl_t{
void (*run_application)(struct application_t*);
void (*stop_application)(struct application_t*);
int client_fd;
int socket_fd;
}
struct application_t* create(){
struct tcp_application_impl_t * app_ptr = malloc(sizeof(struct tcp_application_impl_t));
//do init
return (struct application_t*) app_ptr;
}
So if I use this as follows:
#include "app.h"
int main(){
struct application_t *app_ptr = create();
(app_ptr -> run_application)(app_ptr); //Is this behavior well-defined?
(app_ptr -> stop_application)(app_ptr); //Is this behavior well-defined?
}
The problem confusing me is if I this calling to (app_ptr -> run_application)(app_ptr); yeilds UB.
The "static type" of app_ptr if struct application_t*, but the "dynamic type" is struct tcp_application_impl_t*. The struct application_t and struct tcp_application_t are not compatible by N1570 6.2.7(p1):
there shall be a one-to-one correspondence between their members such
that each pair of corresponding members are declared with compatible
types
which obviously is not true in this case.
Can you please provide a reference to the Standard explaining the behavior?
Your two structs aren't compatible since they are different types. You have already found the chapter "compatible types" that defines what makes two structs compatible. The UB comes later when you access these structs with a pointer to the wrong type, strict aliasing violation as per 6.5/7.
The obvious way to solve this would have been this:
struct tcp_application_impl_t{
struct application_t app;
int client_fd;
int socket_fd;
}
Now the types may alias, since tcp_application_impl_t is an aggregate containing a application_t among its members.
An alternative to make this well-defined, is to use a sneaky special rule of "union common initial sequence", found hidden in C17 6.5.2.3/6:
One special guarantee is made in order to simplify the use of unions: if a union contains
several structures that share a common initial sequence (see below), and if the union
object currently contains one of these structures, it is permitted to inspect the common
initial part of any of them anywhere that a declaration of the completed type of the union
is visible. Two structures share a common initial sequence if corresponding members
have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
This would allow you to use your original types as you declared them. But somewhere in the same translation unit, you will have to add a dummy union typedef to utilize the above rule:
typedef union
{
struct application_t app;
struct tcp_application_impl_t impl;
} initial_sequence_t;
You don't need to actually use any instance of this union, it just needs to sit there visible. This tells the compiler that these two types are allowed to alias, as far as their common initial sequence goes. In your case, it means the function pointers but not the trailing variables in tcp_application_impl_t.
Edit:
Disclaimer. The common initial sequence trick is apparently a bit controversial, with compilers doing other things with it than the committee intended. And possibly works differently in C and C++. See union 'punning' structs w/ "common initial sequence": Why does C (99+), but not C++, stipulate a 'visible declaration of the union type'?
If the "strict aliasing rule" (N1570 6.5p7) is interpreted merely as specifying the circumstances under which things may alias (which would seem to be what the authors intended, given Footnote 88, which says "The intent of this list is to specify those circumstances in which an object may or may not be aliased") code like yours should pose no problem provided that in all contexts where an object is accessed using lvalues of two different types, one of the involved lvalues is visibly freshly derived from the other.
The only way 6.5p7 can make any sense is if operations involving objects that are freshly visibly derived from other objects are recognized as operations on the originals. The question of when to recognize such derivation is left as a quality-of-implementation issue, however, and thought the marketplace would be better able to judge than the Committee what was necessary for something to be a "quality" implementation suitable for some particular purpose.
If the goal is to write code that will work on implementations which are configured to honor the clear intention of footnote 88, one should be safe provided that objects don't alias. Upholding this requirement may require that one ensure that the compiler can see either that pointers are related to each other, or that they are each freshly derived from a common object at point of use. Given, e.g.
thing1 *p1 = unionArray[i].member1;
int v1 = p1->x;
thing2 *p2 = unionArray[j].member2;
p2->x = 31;
thing1 *p3 = unionArray[i].member1;
int v2 = p3->x;
each pointer would be used in a context where it was freshly derived from unionArray, and thus there would be no aliasing even if i==j. A compiler like "icc" will have no problems with such code, even with -fstrict-aliasing enabled, but because both gcc and clang impose the requirements of 6.5p7 upon programmers even in cases not involving aliasing, they will not process it correctly.
Note that if the code had been:
thing1 *p1 = unionArray[i].member1;
int v1 = p1->x;
thing2 *p2 = unionArray[j].member2;
p2->x = 31;
int v2 = p1->x;
then the second use of p1 would alias p2 in cases where i==j because p2 would access the storage associated with p1, via means not involving p1, between the time p1 is formed and the last time it is used (thus aliasing p1).
According to the authors of the Standard, the Spirit of C includes the principles "Trust the programmer" and "Don't prevent the programmer from doing what needs to be done". Unless there is a particular need to cope with the limitations of an implementation that is not particularly well suited to what one is doing, one should target implementations that uphold the Spirit of C in a fashion appropriate to one's purposes. The -fstrict-aliasing dialect processed by icc, or the -fno-strict-aliasing dialects processed by icc, gcc, and clang, should be suitable for your purposes. The -fstrict-aliasing dialects of gcc and clang should be recognized as simply unsuitable for your purposes, and not worth targeting.
This is in regards to some old (pre-C89) code that I'm working on. Part of that code is a small library that defines these structures in the header file:
// lib.h
struct data_node {
const struct data_node *next;
const struct data_node *prev;
void *data;
};
struct trace_node {
const struct trace_node *next;
const struct trace_node *prev;
unsigned int id;
const char *file;
int line;
};
const struct trace_node *get_trace(void);
The source file redefines those same structures, like so:
// lib.c
// does *not* include "lib.h"
struct data_node {
struct data_node *next;
struct data_node *prev;
void *data;
};
struct trace_node {
struct trace_node *next;
struct trace_node *prev;
unsigned int id;
const char *file;
int line;
struct data_node *syncData; /* not included in header file version */
};
It works like you would expect: the syncData field is not visible to client code that includes the "lib.h" header.
Background
The library maintains 2 internal lists: the trace list, and the data list. The syncData field keeps the 2 lists in sync (go figure).
If client code had access to the syncData field, it could disrupt the synchronization between the lists. But, the trace list can get pretty large, so rather than copying every node into a smaller version of the struct, it just returns the address of the sentinel node for the internal list.
Question
I've compiled this with -Wall, -Wpedantic, and -Wextra, and I can't get gcc to complain about it, both with -std=c99 and -std=c11. A hex dump of the memory shows the bytes for the hidden field, right where they ought to be.
The relevant section of the standard (6.2.7.1) says:
Two types have compatible type if their types are the same. Additional rules for
determining whether two types are compatible are described in 6.7.2 for type specifiers,
in 6.7.3 for type qualifiers, and in 6.7.5 for declarators.46) Moreover, two structure,
union, or enumerated types declared in separate translation units are compatible if their
tags and members satisfy the following requirements: If one is declared with a tag, the
other shall be declared with the same tag. If both are complete types, then the following
additional requirements apply: there shall be a one-to-one correspondence between their
members such that each pair of corresponding members are declared with compatible
types, and such that if one member of a corresponding pair is declared with a name, the
other member is declared with the same name. For two structures, corresponding
members shall be declared in the same order. For two structures or unions, corresponding
bit-fields shall have the same widths. For two enumerations, corresponding members
shall have the same values.
Which, depending on how you want to read it, could be taken to say that compatible struct definitions are restricted to ONLY having corresponding pairs of members (and no others), or that struct definitions are compatible if, where they do have corresponding pairs of members, those pairs meet the requirements.
I don't think this is Undefined Behavior. At worst, I think it may be unspecified. Should I refactor this to use 2 distinct struct definitions? Doing this would require a performance hit to allocate a new public node for each node in the internal list and copy over the public data.
This has undefined behavior, and for good reasons.
First, the text clearly states that compatible struct must have a one-to-one correspondance between fields. So the behavior is undefined if client and library access the same object. A compiler can't detect that undefined behavior, because this rule is about stiching together knowledge from two different translation units that are compiled separately. This is why you don't see any diagnostic.
The reason that your example is particularly bad practise is that not even the sizes of the two struct types aggree and maybe not even their alignment. So a client that accesses such an object will make false assumptions about optimization opportunities.
If lib.c does not include lib.h then definitions from there are not visible to lib.c so no conflicts.
C does not use function overloading and so it does not use name mangling and so if you have this declaration:
struct trace_node *get_trace(void);
in one place but function is implemented as
struct foo_trace_node *get_trace(void);
then the linker will happily link your code with that get_trace()
What you do is directly violating standards. A "one-to-one correspondance" would expect the pointer to be seen by client code as well. Your code violates the first part of that sentence.
Imagine the client code to unlink one structure from the list and create and link in one of "his" structures with the wrong size - Lib would crash sooner or later when dereferencing that pointer.
If you don't want to expose certain fields of structures, don't expose the structure at all. Hand the client code an anonymous structure pointer and expose accessor functions in the lib that return field values the client is allowed to see. Or pack the "allowed part" into an embedded structure within the larger one and hand that to client code if you want to avoid field-by-field access. It's probably also not a good idea to have client code see the link pointers of your list structure.
I found some additional information in
The New C Standard
An Economic and Cultural Commentary
Derek M. Jones
derek#knosof.co.uk
http://www.coding-guidelines.com/cbook/cbook1_2.pdf
This particular version covers the C99 standard, but since the text for the relevant section is identical in both versions of the standard, it's a wash.
Pertinent commentary:
633
Moreover, two structure, union, or enumerated types declared in
separate translation units are compatible if their tags and members
satisfy the following requirements:
Commentary
These requirements apply if the structure or union type was declared
via a typedef or through any other means. Because there can be more
than one declaration of a type in the same translation unit, these
requirements really apply to the composite type in each translation
unit. In the following list of requirements, those that only apply to
structures, unions, enumerations, or a combination thereof are
explicitly called out as such. Two types are compatible if they obey
both of the following requirements:
• Tag compatibility.
If both types have tags, both shall be the same.
If one, or neither, type has a tag, there is no requirement to be obeyed.
• Member compatibility.
Here the requirement is that for every member in both types there is a
corresponding member in the other type that has the following
properties:
The corresponding members have a compatible type.
The corresponding members either have the same name or are unnamed.
For structure types, the corresponding members are defined in the same order in their respective definitions.
For structure and union types, the corresponding members shall either both be bit-fields having the same width, or neither shall be
bit-fields.
For enumerated types, the corresponding members (the enumeration constants) have the same value.
So the verdict is unanimous. The wording in Mr. Jones' commentary is perhaps a little bit clearer (to me, at any rate).
I failed to mention in the OP that the original header file comments for the get_trace() function clearly state that the trace list is to be considered strictly read-only, so the points raised about an object of the abbreviated structure in client code finding its way back into the library code are - while still valid in the general sense - not exactly applicable in this specific case.
However, the question of compiler optimizations is bang on, especially considering how much more aggressive compiler optimizations are now, versus 35 years ago. So, I will refactor.
I have two header files as mentioned below:
file1.h
typedef struct can_type {
int x;
float y;
} M_can_type;
file2.h
typedef struct can_type {
int x;
float y;
} can_type;
Can I define both structures of the same type in different files given as above where I want to pass one structure to another? Also how to map two different type structures to the same type so that I can pass elements of one structure to other?
Yes, it is legal. Indeed, it is necessary. And it is explicit in the standard, but it is in one of the more turgid and nearly incomprehensible sections of the standard.
ISO/IEC 9899:2011 §6.2.7 Compatible type and composite type
¶1 Two types have compatible type if their types are the same. Additional rules for determining whether two types are compatible are described in 6.7.2 for type specifiers, in 6.7.3 for type qualifiers, and in 6.7.6 for declarators.55) Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: If one is declared with a tag, the other shall be declared with the same tag. If both are completed anywhere within their respective translation units, then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of
corresponding members are declared with compatible types; if one member of the pair is declared with an alignment specifier, the other is declared with an equivalent alignment specifier; and if one member of the pair is declared with a name, the other is declared with the same name. For two structures, corresponding members shall be declared in the same order. For two structures or unions, corresponding bit-fields shall have the same
widths. For two enumerations, corresponding members shall have the same values.
55) Two types need not be identical to be compatible.
The section from 'Moreover' onwards discusses the situation you are asking about.
Note that although you have two different typedef names (M_can_type and can_type) in your two headers, the structures that are defined meet the requirements. Remember, typedef names are only aliases for existing other types (so M_can_type is an alias for struct can_type in files that include file1.h and can_type is an alias for struct can_type in files that include file2.h). Because each header defines the structure type, any given source file can only include (directly or indirectly) one of the two headers. If you tried to include both, you'd get the structure type redefined, and that is not allowed (even in C11, where you can have the same typedef name redefined as long as it defines the same type, but you still can't have two definitions of the structure type at the same scope in a single translation unit).
The most common way of ensuring that the types in the separate translation units are compatible is to use a single header to define the type and to include that header in both translation units. However, if you think about it, the compiler doesn't know or care about whether that's what you do. All that matters to it is that the text it sees after preprocessing identifies the same type.