Struct that apparently defines no instances in Unix v6 - c

I'm going through the code of Unix version 6 with the Lion's book. One of the header files (param.h, can be accessed here) defines the following structs:
/*struct to access integers*/
/*single integer */
struct { int integ; };
/*in bytes*/
struct { char lobyte; char hibyte; };
These structures don't seem to define any instance, nor are they named so they can be used later. Does anybody know what is their use?
Thanks

If someone included the whole file in a union declaration, it would allow them to access the different parts.
It would be something like:
union{
#include <param.h>
} myparam;
myparam.integ = 0xDEAD;
assert(myparam.lobyte == 0xAD)
assert(myparam.hibyte == 0xDE)
(Depends on endianness of architecture...)
So having looked around a bit, it seems that in old versions of C, you wouldn't have needed to declare the union ; there was only one namespace for all struct/union members that just translated into a byte offset that you could use on any variable. The best mention of this I could find is here :
http://docs.sun.com/source/806-3567/compat.html
Describing pre-ISO Sun C:
Allows struct, union, and arithmetic types using member selection operators ('.', '->') to work on members of other struct(s) or unions.

Back in those days, the members of structures all shared the same namespace, not one namespace per structure. Consequently, each element of a structure had to have a unique name across all structures, or the same element had to appear with the same type at the same offset in every structure in which it appeared. Quite how that was used with these, I'm not sure, but I suspect you could do:
int x;
x.lobyte = 1;
x.hibyte = 2;
Or something analogous to that.
See also:
http://www.cs.bell-labs.com/who/dmr/chist.html
http://www.cs.bell-labs.com/who/dmr/primevalC.html
(Neither of those seems to answer this question, though.)

Related

What is the real difference between struct and typedef struct in C?

There are two main ways of defining structs:
struct triangle_s {
int a,b,c;
};
and
typedef struct triangle_s {
int a,b,c;
} triangle;
It has been asked many times but every answer is about not having to write struct so many times when using the typedef variant. Is there a real difference other than you can avoid repeating the struct keyword? I've heard that you should never use the typedef variant in C but nobody said why.
There is no difference, typedef just removes the requirement to prefix variable declarations with struct.
Whether or not to typedef structs by default is a religious war fought mostly by people with too much time on their hands. When working inside an existing code base you should do whatever the coding standard or the surrounding code does. For your own personal code do whatever you prefer.
There is no such thing as a "typedef struct".
Part the first: structure types
struct introduces a structure, which is an aggregate datatype consisting of a set of named members. (Arrays are also aggregates, but they consist of a number of identical members which are indexed. Unions have a set of member names, but can only contain one member at a time, so they are not aggregates. You probably didn't need to know that.)
Structure types usually have tags, so the actual typename will be something like struct Triangle. The tag (Triangle) is not in the same namespace as identifiers, so there is no problem using a tag which is also used for another purpose. Some people like to append tags with _s and _u, indicating that they are structure or union tags respectively, but that's not my style; I prefer to name types in CamelCase, and for me a structure or union tag is standing in for a typename. But of course you are free to use your own conventions.
If you use struct SomeTag in a program, you are effectively declaring that there is a structure whose tag is SomeTag. You're not required to fill in the declaration by naming or describing the structure's members, unless you need to refer to them in your code. A structure whose members have not (yet) been declared is called incomplete, but it can still be used as part of a pointer type because the C standard guarantees that all structure pointers have the same format, regardless of the contents of the structure. (That doesn't make them interchangeable, but it does mean that the compiler knows how big the pointers are.) A structure which never has its members defined and which is used only as the the target of a pointer type is called opaque.
You can complete the declaration of a structure by adding a block of member declarations. So
struct Triangle {
int a,b,c;
};
first declares that there is a structure whose name is struct Triangle, and then fills in the definition of that structure by declaring three named members which are all ints.
Union declarations and definitions are all very similar, by the way.
A structure definition can be used in a declaration as though it were a type name. Or to put it another way, you can declare the tag for a structure type, immediately fill in the fields, and then declare one or more variables of that type:
struct Triangle { int a, b, c; } aTriangle, anotherTriangle;
That's not a very common style, but it's important to know that the syntax is possible.
Finally, it is legal define a structure without giving it a tag. Tagless structure types have a quirk: normally no two structure types can have the same tag, but all tagless structures are distinct. That means that you can declare a structure type which effectively has no name, and which is different from any other structure type, even a structure type with exactly the same members. That can be slightly useful if you have an aggregate which will only ever have one instance (a "singleton"), although I wouldn't really ever use this style myself. But after a small detour, we'll see another use for this feature.
Part the second: type aliases
C type names can be quite complicated, since they can be built up out of pieces. For example, const struct Triangle*[8] is an array of eight members, each of which is a pointer to an unmodifiable struct Triangle. double (*)(const struct Triangle*[8]) is a function which accepts one such array as an argument (or, more accurately, which accepts a pointer to the first element of such an array, because of array-to-pointer decay. But that's not relevant here.)
To make complex types a bit easier to use, C allows you to declare an alias name for a type. Aliases are declared with typedef and otherwise look exactly like the declaration of a variable. So, for example, you can declare a variable of type int with
int someNumber;
and thus you can declare an alias for the type int with
typedef int someType;
Similarly, you could declare an array of eight pointers to const Triangle elements with
const Triangle* eightSlices[8];
In exactly the same way, you can declare a name for the type of such an array with:
typedef const Triangle* EightSlices[8];
Note that the name of the type goes exactly where the name of the object would go, which can be somewhere in the middle of the declaration.
Part the third: both of the above in one statement
As a simple example of declaring type aliases, here's how you declare an alias for a structure type:
An incomplete structure definition:
typedef struct Triangle Triangle;
Or a complete structure definition:
typedef struct Triangle {
int a, b, c;
} Triangle;
Or both, separately (and these could go in either order):
typedef struct Triangle Triangle;
struct Triangle {
int a, b, c;
};
Remember that structure tags and other identifiers (such as type aliases) are in different namespaces, so there is no conflict between the two uses of Triangle above. Some programmers feel that it is necessary to distinguish between them, even though one of them can only be used immediately after the word struct and the other one cannot be used following the word struct. Others -- and I think you can guess that I fall into this crowd -- find it convenient to deliberately use the same name for both, relying on the presence or absence of the word struct to let us know whether it is a type alias or a tag. (And, more commonly, to indicate that we have no intention of ever again using the tag.)
So, back to my opening comment:
There is no such thing as a "typedef struct".
And there isn't. What we have here is a very ordinary struct, declared and defined. And we have a very ordinary type alias which gives an alternate name for that struct. And that's it.
Note that you can give an alias to an anonymous type (such as a tagless structure type), after which the type is no longer anonymous. So there are some people who would leave out the tag in the above definition:
typedef struct {
int a, b, c;
} Triangle;
That looks a lot like the singleton structure type mentioned above, but since it is a type it can be used to declare multiple instances. But I don't actually recommend this style.
To each their own: an ignorable appendix
Everyone has their own style preferences, and most of these preferences are valid. Most of us have worked on more than one project, and since every project tends to develop its own style guide, we need to learn how to accept and use different styles in different projects. But I think most of us have some style with which we feel most comfortable, which we will revert to when we're writing code just for ourselves (or when we're starting a project with the intention of attracting colleagues prepared to conform to our style). And what I've used above is my style.
In fact, I try to avoid the condensed declaration+definition+alias syntax, preferring the two-declaration version shown above:
typedef struct Triangle Triangle; /* type alias */
struct Triangle {
int a, b, c;
}; /* type definition */
The reason I prefer that is that it lets me define types with self-referring members, such as linked lists and trees:
typedef struct TriangleList TriangleList;
struct TriangleList {
Triangle slice;
TriangleList* next;
};
(If I hadn't forward-aliased the type, I would have had to declare the member as struct TriangleList* next; which makes for even uglier alignment.)
Sometimes I end up with mutually referring types, and in that case, I need to gather the aliases together before any of the structure definitions. That also can be advantageous, because the alias definitions allow for opaque use of pointers to the type and can therefore be placed into a public header which does not include the type definitions at all.
But that's all just me. Feel free to ignore it.
There are two main ways of defining structs
No, there is only one:
struct triangle_s {
int a,b,c;
}
. Such a definition can appear in several contexts, including, but not limited to, the two you have presented, but it's important to understand that the two statements you offered are each more than a structure definition.
This one's "more" is admittedly a bit vacuous:
struct triangle_s {
int a,b,c;
};
It is a declaration of the structure type and zero objects of that type. There is an identifier list between the closing brace and the semicolon, which in that particular case happens to be empty. But it could also be this, for example:
struct triangle_s {
int a,b,c;
} triangle;
, which defines the type struct triangle_s and declares triangle as an object of that type. And of course, that leads us to the second example
typedef struct triangle_s {
int a,b,c;
} triangle;
, which is exactly the same as the preceding one, except that the typedef keyword indicates that the identifier(s) declared are type aliases instead of object identifiers. Again, the structure definition is just the part from the struct keyword to the associated closing brace, and note that typedef can be used to declare aliases for any type, not just structures.
Is there a real difference other than you can avoid repeating the
struct keyword?
You have a definition of struct triangle_s either way, and you can use that form of its type name either way. In the latter case, you can also use triangle (wherever it is in scope) as an alias for struct triangle_s, for any and every purpose that could be served by the form struct triangle_s, with no difference in meaning. That is the point of typedef.
I've heard that you should never use the typedef
variant in C but nobody said why.
Some people argue against the second form as a matter of style, on the basis that it commingles two different things in the same declaration -- a definition of the structure type and a declaration of an alias for it. That line of reasoning continues that if you want the typedef, then you should declare it separately:
struct triangle_s {
int a,b,c;
};
typedef struct triangle_s triangle;
Again, however, this is a matter of code style, not functionality.
There are other style points surrounding the use of typedef about which I, personally, feel much more strongly.

Convention for declaring structs in C [duplicate]

I have seen many programs consisting of structures like the one below
typedef struct
{
int i;
char k;
} elem;
elem user;
Why is it needed so often? Any specific reason or applicable area?
As Greg Hewgill said, the typedef means you no longer have to write struct all over the place. That not only saves keystrokes, it also can make the code cleaner since it provides a smidgen more abstraction.
Stuff like
typedef struct {
int x, y;
} Point;
Point point_new(int x, int y)
{
Point a;
a.x = x;
a.y = y;
return a;
}
becomes cleaner when you don't need to see the "struct" keyword all over the place, it looks more as if there really is a type called "Point" in your language. Which, after the typedef, is the case I guess.
Also note that while your example (and mine) omitted naming the struct itself, actually naming it is also useful for when you want to provide an opaque type. Then you'd have code like this in the header, for instance:
typedef struct Point Point;
Point * point_new(int x, int y);
and then provide the struct definition in the implementation file:
struct Point
{
int x, y;
};
Point * point_new(int x, int y)
{
Point *p;
if((p = malloc(sizeof *p)) != NULL)
{
p->x = x;
p->y = y;
}
return p;
}
In this latter case, you cannot return the Point by value, since its definition is hidden from users of the header file. This is a technique used widely in GTK+, for instance.
UPDATE Note that there are also highly-regarded C projects where this use of typedef to hide struct is considered a bad idea, the Linux kernel is probably the most well-known such project. See Chapter 5 of The Linux Kernel CodingStyle document for Linus' angry words. :) My point is that the "should" in the question is perhaps not set in stone, after all.
It's amazing how many people get this wrong. PLEASE don't typedef structs in C, it needlessly pollutes the global namespace which is typically very polluted already in large C programs.
Also, typedef'd structs without a tag name are a major cause of needless imposition of ordering relationships among header files.
Consider:
#ifndef FOO_H
#define FOO_H 1
#define FOO_DEF (0xDEADBABE)
struct bar; /* forward declaration, defined in bar.h*/
struct foo {
struct bar *bar;
};
#endif
With such a definition, not using typedefs, it is possible for a compiland unit to include foo.h to get at the FOO_DEF definition. If it doesn't attempt to dereference the 'bar' member of the foo struct then there will be no need to include the "bar.h" file.
Also, since the namespaces are different between the tag names and the member names, it is possible to write very readable code such as:
struct foo *foo;
printf("foo->bar = %p", foo->bar);
Since the namespaces are separate, there is no conflict in naming variables coincident with their struct tag name.
If I have to maintain your code, I will remove your typedef'd structs.
From an old article by Dan Saks (http://www.ddj.com/cpp/184403396?pgno=3):
The C language rules for naming
structs are a little eccentric, but
they're pretty harmless. However, when
extended to classes in C++, those same
rules open little cracks for bugs to
crawl through.
In C, the name s appearing in
struct s
{
...
};
is a tag. A tag name is not a type
name. Given the definition above,
declarations such as
s x; /* error in C */
s *p; /* error in C */
are errors in C. You must write them
as
struct s x; /* OK */
struct s *p; /* OK */
The names of unions and enumerations
are also tags rather than types.
In C, tags are distinct from all other
names (for functions, types,
variables, and enumeration constants).
C compilers maintain tags in a symbol
table that's conceptually if not
physically separate from the table
that holds all other names. Thus, it
is possible for a C program to have
both a tag and an another name with
the same spelling in the same scope.
For example,
struct s s;
is a valid declaration which declares
variable s of type struct s. It may
not be good practice, but C compilers
must accept it. I have never seen a
rationale for why C was designed this
way. I have always thought it was a
mistake, but there it is.
Many programmers (including yours
truly) prefer to think of struct names
as type names, so they define an alias
for the tag using a typedef. For
example, defining
struct s
{
...
};
typedef struct s S;
lets you use S in place of struct s,
as in
S x;
S *p;
A program cannot use S as the name of
both a type and a variable (or
function or enumeration constant):
S S; // error
This is good.
The tag name in a struct, union, or
enum definition is optional. Many
programmers fold the struct definition
into the typedef and dispense with the
tag altogether, as in:
typedef struct
{
...
} S;
The linked article also has a discussion about how the C++ behavior of not requireing a typedef can cause subtle name hiding problems. To prevent these problems, it's a good idea to typedef your classes and structs in C++, too, even though at first glance it appears to be unnecessary. In C++, with the typedef the name hiding become an error that the compiler tells you about rather than a hidden source of potential problems.
Using a typedef avoids having to write struct every time you declare a variable of that type:
struct elem
{
int i;
char k;
};
elem user; // compile error!
struct elem user; // this is correct
One other good reason to always typedef enums and structs results from this problem:
enum EnumDef
{
FIRST_ITEM,
SECOND_ITEM
};
struct StructDef
{
enum EnuumDef MyEnum;
unsigned int MyVar;
} MyStruct;
Notice the typo in EnumDef in the struct (EnuumDef)? This compiles without error (or warning) and is (depending on the literal interpretation of the C Standard) correct. The problem is that I just created an new (empty) enumeration definition within my struct. I am not (as intended) using the previous definition EnumDef.
With a typdef similar kind of typos would have resulted in a compiler errors for using an unknown type:
typedef
{
FIRST_ITEM,
SECOND_ITEM
} EnumDef;
typedef struct
{
EnuumDef MyEnum; /* compiler error (unknown type) */
unsigned int MyVar;
} StructDef;
StrructDef MyStruct; /* compiler error (unknown type) */
I would advocate ALWAYS typedef'ing structs and enumerations.
Not only to save some typing (no pun intended ;)), but because it is safer.
Linux kernel coding style Chapter 5 gives great pros and cons (mostly cons) of using typedef.
Please don't use things like "vps_t".
It's a mistake to use typedef for structures and pointers. When you see a
vps_t a;
in the source, what does it mean?
In contrast, if it says
struct virtual_container *a;
you can actually tell what "a" is.
Lots of people think that typedefs "help readability". Not so. They are useful only for:
(a) totally opaque objects (where the typedef is actively used to hide what the object is).
Example: "pte_t" etc. opaque objects that you can only access using the proper accessor functions.
NOTE! Opaqueness and "accessor functions" are not good in themselves. The reason we have them for things like pte_t etc. is that there really is absolutely zero portably accessible information there.
(b) Clear integer types, where the abstraction helps avoid confusion whether it is "int" or "long".
u8/u16/u32 are perfectly fine typedefs, although they fit into category (d) better than here.
NOTE! Again - there needs to be a reason for this. If something is "unsigned long", then there's no reason to do
typedef unsigned long myflags_t;
but if there is a clear reason for why it under certain circumstances might be an "unsigned int" and under other configurations might be "unsigned long", then by all means go ahead and use a typedef.
(c) when you use sparse to literally create a new type for type-checking.
(d) New types which are identical to standard C99 types, in certain exceptional circumstances.
Although it would only take a short amount of time for the eyes and brain to become accustomed to the standard types like 'uint32_t', some people object to their use anyway.
Therefore, the Linux-specific 'u8/u16/u32/u64' types and their signed equivalents which are identical to standard types are permitted -- although they are not mandatory in new code of your own.
When editing existing code which already uses one or the other set of types, you should conform to the existing choices in that code.
(e) Types safe for use in userspace.
In certain structures which are visible to userspace, we cannot require C99 types and cannot use the 'u32' form above. Thus, we use __u32 and similar types in all structures which are shared with userspace.
Maybe there are other cases too, but the rule should basically be to NEVER EVER use a typedef unless you can clearly match one of those rules.
In general, a pointer, or a struct that has elements that can reasonably be directly accessed should never be a typedef.
It turns out that there are pros and cons. A useful source of information is the seminal book "Expert C Programming" (Chapter 3). Briefly, in C you have multiple namespaces: tags, types, member names and identifiers. typedef introduces an alias for a type and locates it in the tag namespace. Namely,
typedef struct Tag{
...members...
}Type;
defines two things. 1) Tag in the tag namespace and 2) Type in the type namespace. So you can do both Type myType and struct Tag myTagType. Declarations like struct Type myType or Tag myTagType are illegal. In addition, in a declaration like this:
typedef Type *Type_ptr;
we define a pointer to our Type. So if we declare:
Type_ptr var1, var2;
struct Tag *myTagType1, myTagType2;
then var1,var2 and myTagType1 are pointers to Type but myTagType2 not.
In the above-mentioned book, it mentions that typedefing structs are not very useful as it only saves the programmer from writing the word struct. However, I have an objection, like many other C programmers. Although it sometimes turns to obfuscate some names (that's why it is not advisable in large code bases like the kernel) when you want to implement polymorphism in C it helps a lot look here for details. Example:
typedef struct MyWriter_t{
MyPipe super;
MyQueue relative;
uint32_t flags;
...
}MyWriter;
you can do:
void my_writer_func(MyPipe *s)
{
MyWriter *self = (MyWriter *) s;
uint32_t myFlags = self->flags;
...
}
So you can access an outer member (flags) by the inner struct (MyPipe) through casting. For me it is less confusing to cast the whole type than doing (struct MyWriter_ *) s; every time you want to perform such functionality. In these cases brief referencing is a big deal especially if you heavily employ the technique in your code.
Finally, the last aspect with typedefed types is the inability to extend them, in contrast to macros. If for example, you have:
#define X char[10] or
typedef char Y[10]
you can then declare
unsigned X x; but not
unsigned Y y;
We do not really care for this for structs because it does not apply to storage specifiers (volatile and const).
I don't think forward declarations are even possible with typedef. Use of struct, enum, and union allow for forwarding declarations when dependencies (knows about) is bidirectional.
Style:
Use of typedef in C++ makes quite a bit of sense. It can almost be necessary when dealing with templates that require multiple and/or variable parameters. The typedef helps keep the naming straight.
Not so in the C programming language. The use of typedef most often serves no purpose but to obfuscate the data structure usage. Since only { struct (6), enum (4), union (5) } number of keystrokes are used to declare a data type there is almost no use for the aliasing of the struct. Is that data type a union or a struct? Using the straightforward non-typdefed declaration lets you know right away what type it is.
Notice how Linux is written with strict avoidance of this aliasing nonsense typedef brings. The result is a minimalist and clean style.
Let's start with the basics and work our way up.
Here is an example of Structure definition:
struct point
{
int x, y;
};
Here the name point is optional.
A Structure can be declared during its definition or after.
Declaring during definition
struct point
{
int x, y;
} first_point, second_point;
Declaring after definition
struct point
{
int x, y;
};
struct point first_point, second_point;
Now, carefully note the last case above; you need to write struct point to declare Structures of that type if you decide to create that type at a later point in your code.
Enter typedef. If you intend to create new Structure ( Structure is a custom data-type) at a later time in your program using the same blueprint, using typedef during its definition might be a good idea since you can save some typing moving forward.
typedef struct point
{
int x, y;
} Points;
Points first_point, second_point;
A word of caution while naming your custom type
Nothing prevents you from using _t suffix at the end of your custom type name but POSIX standard reserves the use of suffix _t to denote standard library type names.
The name you (optionally) give the struct is called the tag name and, as has been noted, is not a type in itself. To get to the type requires the struct prefix.
GTK+ aside, I'm not sure the tagname is used anything like as commonly as a typedef to the struct type, so in C++ that is recognised and you can omit the struct keyword and use the tagname as the type name too:
struct MyStruct
{
int i;
};
// The following is legal in C++:
MyStruct obj;
obj.i = 7;
typedef will not provide a co-dependent set of data structures. This you cannot do with typdef:
struct bar;
struct foo;
struct foo {
struct bar *b;
};
struct bar {
struct foo *f;
};
Of course you can always add:
typedef struct foo foo_t;
typedef struct bar bar_t;
What exactly is the point of that?
A>
a typdef aids in the meaning and documentation of a program by allowing creation of more meaningful synonyms for data types. In addition, they help parameterize a program against portability problems (K&R, pg147, C prog lang).
B>
a structure defines a type. Structs allows convenient grouping of a collection of vars for convenience of handling (K&R, pg127, C prog lang.) as a single unit
C>
typedef'ing a struct is explained in A above.
D> To me, structs are custom types or containers or collections or namespaces or complex types, whereas a typdef is just a means to create more nicknames.
In 'C' programming language the keyword 'typedef' is used to declare a new name for some object(struct, array, function..enum type). For example, I will use a 'struct-s'.
In 'C' we often declare a 'struct' outside of the 'main' function. For example:
struct complex{ int real_part, img_part }COMPLEX;
main(){
struct KOMPLEKS number; // number type is now a struct type
number.real_part = 3;
number.img_part = -1;
printf("Number: %d.%d i \n",number.real_part, number.img_part);
}
Each time I decide to use a struct type I will need this keyword 'struct 'something' 'name'.'typedef' will simply rename that type and I can use that new name in my program every time I want. So our code will be:
typedef struct complex{int real_part, img_part; }COMPLEX;
//now COMPLEX is the new name for this structure and if I want to use it without
// a keyword like in the first example 'struct complex number'.
main(){
COMPLEX number; // number is now the same type as in the first example
number.real_part = 1;
number.img)part = 5;
printf("%d %d \n", number.real_part, number.img_part);
}
If you have some local object(struct, array, valuable) that will be used in your entire program you can simply give it a name using a 'typedef'.
Turns out in C99 typedef is required. It is outdated, but a lot of tools (ala HackRank) use c99 as its pure C implementation. And typedef is required there.
I'm not saying they should change (maybe have two C options) if the requirement changed, those of us studing for interviews on the site would be SOL.
At all, in C language, struct/union/enum are macro instruction processed by the C language preprocessor (do not mistake with the preprocessor that treat "#include" and other)
so :
struct a
{
int i;
};
struct b
{
struct a;
int i;
int j;
};
struct b is expended as something like this :
struct b
{
struct a
{
int i;
};
int i;
int j;
}
and so, at compile time it evolve on stack as something like:
b:
int ai
int i
int j
that also why it's dificult to have selfreferent structs, C preprocessor round in a déclaration loop that can't terminate.
typedef are type specifier, that means only C compiler process it and it can do like he want for optimise assembler code implementation. It also dont expend member of type par stupidly like préprocessor do with structs but use more complex reference construction algorithm, so construction like :
typedef struct a A; //anticipated declaration for member declaration
typedef struct a //Implemented declaration
{
A* b; // member declaration
}A;
is permited and fully functional. This implementation give also access to compilator type conversion and remove some bugging effects when execution thread leave the application field of initialisation functions.
This mean that in C typedefs are more near as C++ class than lonely structs.

C empty struct -- what does this mean/do?

I found this code in a header file for a device that I need to use, and although I've been doing C for years, I've never run into this:
struct device {
};
struct spi_device {
struct device dev;
};
and it used as in:
int spi_write_then_read(struct spi_device *spi,
const unsigned char *txbuf, unsigned n_tx,
unsigned char *rxbuf, unsigned n_rx);
and also here:
struct spi_device *spi = phy->spi;
where it is defined the same.
I'm not sure what the point is with this definition. It is in a header file for a linux application of the board, but am baffled by it use. Any explanations, ideas? Anyone seen this before (I'm sure some of you have :).
Thanks!
:bp:
This is not C as C structures have to contain at least one named member:
(C11, 6.7.2.1 Structure and union specifiers p8) "If the struct-declaration-list does not contain any named members, either directly or via an anonymous structure or anonymous union, the behavior is undefined."
but a GNU C extension:
GCC permits a C structure to have no members:
struct empty {
};
The structure has size zero
https://gcc.gnu.org/onlinedocs/gcc/Empty-Structures.html
I don't know what is the purpose of this construct in your example but in general I think it may be used as a forward declaration of the structure type. Note that in C++ it is allowed to have a class with no member.
In Linux 2.4 there is an example of an empty structure type with conditional compilation in the definition of spin_lock_t type alias in Linux kernel 2.4 (in include/linux/spinlock.h):
#if (DEBUG_SPINLOCKS < 1)
/* ... */
typedef struct { } spinlock_t;
#elif (DEBUG_SPINLOCKS < 2)
/* ... */
typedef struct {
volatile unsigned long lock;
} spinlock_t;
#else /* (DEBUG_SPINLOCKS >= 2) */
/* ... */
typedef struct {
volatile unsigned long lock;
volatile unsigned int babble;
const char *module;
} spinlock_t;
#endif
The purpose is to save some space without having to change the functions API in case DEBUG_SPINLOCKS < 1. It also allows to define dummy (zero-sized) objects of type spinlock_t.
Another example in the (recent) Linux kernel of an empty structure hack used with conditional compilation in include/linux/device.h:
struct acpi_dev_node {
#ifdef CONFIG_ACPI
void *handle;
#endif
};
See the discussion with Greg Kroah-Hartman for this last example here:
https://lkml.org/lkml/2012/11/19/453
This is not standard C.
C11: 6.2.5-20:
— A structure type describes a sequentially allocated nonempty set of member objects (and, in certain circumstances, an incomplete array), each of which has an optionally specified name and possibly distinct type.
J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
....
— A structure or union is defined without any named members (including those
specified indirectly via anonymous structures and unions) (6.7.2.1).
GCC uses it as an extension (no more detailed is given there about when/where should it be used). Using this in any program will make it compiler specific.
One reason might to do this for a library is that the library developers do not want you to know or interfere with the internals of these struct. It these cases they may provide an "interface" version of the structs spi_device/device (which is what you may see) and have a second type definition that defines another version of said structs for use inside the library with the actual members.
Since you cannot access struct members or even create compatible structs of that type yourself with that approach (since even your compiler would not know the size actual size of this struct), this only works if the library itself creates the structs, only ever passes you pointers to it, and does not need you to modify any members.
If you add an empty struct as the first member of another struct, the empty
struct can serve as a "marker interface", i.e. when you cast a pointer to that
outer struct to a pointer of the inner struct and the cast succeeds you know
that the outer struct is "marked" as something.
Also it might just be a place holder for future development, not to sure. Hope this helps
This is valid C
struct empty;
struct empty *empty;
and facilitates use of addresses of opaque regions of memory.
Such addresses are usually obtained from and passed to library subroutines.
For example, something like this is done in stdio.h

Access struct members as if they are a single array?

I have two structures, with values that should compute a pondered average, like this simplified version:
typedef struct
{
int v_move, v_read, v_suck, v_flush, v_nop, v_call;
} values;
typedef struct
{
int qtt_move, qtt_read, qtt_suck, qtd_flush, qtd_nop, qtt_call;
} quantities;
And then I use them to calculate:
average = v_move*qtt_move + v_read*qtt_read + v_suck*qtt_suck + v_flush*qtd_flush + v_nop*qtd_nop + v_call*qtt_call;
Every now and them I need to include another variable. Now, for instance, I need to include v_clean and qtt_clean. I can't change the structures to arrays:
typedef struct
{
int v[6];
} values;
typedef struct
{
int qtt[6];
} quantities;
That would simplify a lot my work, but they are part of an API that need the variable names to be clear.
So, I'm looking for a way to access the members of that structures, maybe using sizeof(), so I can treat them as an array, but still keep the API unchangeable. It is guaranteed that all values are int, but I can't guarantee the size of an int.
Writing the question came to my mind... Can a union do the job? Is there another clever way to automatize the task of adding another member?
Thanks,
Beco
What you are trying to do is not possible to do in any elegant way. It is not possible to reliably access consecutive struct members as an array. The currently accepted answer is a hack, not a solution.
The proper solution would be to switch to an array, regardless of how much work it is going to require. If you use enum constants for array indexing (as #digEmAll suggested in his now-deleted answer), the names and the code will be as clear as what you have now.
If you still don't want to or can't switch to an array, the only more-or-less acceptable way to do what you are trying to do is to create an "index-array" or "map-array" (see below). C++ has a dedicated language feature that helps one to implement it elegantly - pointers-to-members. In C you are forced to emulate that C++ feature using offsetof macro
static const size_t values_offsets[] = {
offsetof(values, v_move),
offsetof(values, v_read),
offsetof(values, v_suck),
/* and so on */
};
static const size_t quantities_offsets[] = {
offsetof(quantities, qtt_move),
offsetof(quantities, qtt_read),
offsetof(quantities, qtt_suck),
/* and so on */
};
And if now you are given
values v;
quantities q;
and index
int i;
you can generate the pointers to individual fields as
int *pvalue = (int *) ((char *) &v + values_offsets[i]);
int *pquantity = (int *) ((char *) &q + quantities_offsets[i]);
*pvalue += *pquantity;
Of course, you can now iterate over i in any way you want. This is also far from being elegant, but at least it bears some degree of reliability and validity, as opposed to any ugly hack. The whole thing can be made to look more elegantly by wrapping the repetitive pieces into appropriately named functions/macros.
If all members a guaranteed to be of type int you can use a pointer to int and increment it:
int *value = &(values.v_move);
int *quantity = &(quantities.qtt_move);
int i;
average = 0;
// although it should work, a good practice many times IMHO is to add a null as the last member in struct and change the condition to quantity[i] != null.
for (i = 0; i < sizeof(quantities) / sizeof(*quantity); i++)
average += values[i] * quantity[i];
(Since the order of members in a struct is guaranteed to be as declared)
Writing the question came to my mind... Can a union do the job? Is there another clever way to automatize the task of adding another member?
Yes, a union can certainly do the job:
union
{
values v; /* As defined by OP */
int array[6];
} u;
You can use a pointer to u.values in your API, and work with u.array in your code.
Personally, I think that all the other answers break the rule of least surprise. When I see a plain struct definition, I assume that the structure will be access using normal access methods. With a union, it's clear that the application will access it in special ways, which prompts me to pay extra attention to the code.
It really sounds as if this should have been an array since the beggining, with accessor methods or macros enabling you to still use pretty names like move, read, etc. However, as you mentioned, this isn't feasible due to API breakage.
The two solutions that come to my mind are:
Use a compiler specific directive to ensure that your struct is packed (and thus, that casting it to an array is safe)
Evil macro black magic.
How about using __attribute__((packed)) if you are using gcc?
So you could declare your structures as:
typedef struct
{
int v_move, v_read, v_suck, v_flush, v_nop, v_call;
} __attribute__((packed)) values;
typedef struct
{
int qtt_move, qtt_read, qtt_suck, qtd_flush, qtd_nop, qtt_call;
} __attribute__((packed)) quantities;
According to the gcc manual, your structures will then use the minimum amount of memory possible for storing the structure, omitting any padding that might have normally been there. The only issue would then be to determine the sizeof(int) on your platform which could be done through either some compiler macros or using <stdint.h>.
One more thing is that there will be a performance penalty for unpacking and re-packing the structure when it needs to be accessed and then stored back into memory. But at least you can be assured then that the layout is consistent, and it could be accessed like an array using a cast to a pointer type like you were wanting (i.e., you won't have to worry about padding messing up the pointer offsets).
Thanks,
Jason
this problem is common, and has been solved in many ways in the past. None of them is completely safe or clean. It depends on your particuar application. Here's a list of possible solutions:
1) You can redefine your structures so fields become array elements, and use macros to map each particular element as if it was a structure field. E.g:
struct values { varray[6]; };
#define v_read varray[1]
The disadvantage of this approach is that most debuggers don't understand macros. Another problem is that in theory a compiler could choose a different alignment for the original structure and the redefined one, so the binary compatibility is not guaranted.
2) Count on the compiler's behaviour and treat all the fields as it they were array fields (oops, while I was writing this, someone else wrote the same - +1 for him)
3) create a static array of element offsets (initialized at startup) and use them to "map" the elements. It's quite tricky, and not so fast, but has the advantage that it's independent of the actual disposition of the field in the structure. Example (incomplete, just for clarification):
int positions[10];
position[0] = ((char *)(&((values*)NULL)->v_move)-(char *)NULL);
position[1] = ((char *)(&((values*)NULL)->v_read)-(char *)NULL);
//...
values *v = ...;
int vread;
vread = *(int *)(((char *)v)+position[1]);
Ok, not at all simple. Macros like "offsetof" may help in this case.

Representing dynamic typing in C

I'm writing a dynamically-typed language. Currently, my objects are represented in this way:
struct Class { struct Class* class; struct Object* (*get)(struct Object*,struct Object*); };
struct Integer { struct Class* class; int value; };
struct Object { struct Class* class; };
struct String { struct Class* class; size_t length; char* characters; };
The goal is that I should be able to pass everything around as a struct Object* and then discover the type of the object by comparing the class attribute. For example, to cast an integer for use I would simply do the following (assume that integer is of type struct Class*):
struct Object* foo = bar();
// increment foo
if(foo->class == integer)
((struct Integer*)foo)->value++;
else
handleTypeError();
The problem is that, as far as I know, the C standard makes no promises about how structures are stored. On my platform this works. But on another platform struct String might store value before class and when I accessed foo->class in the above I would actually be accessing foo->value, which is obviously bad. Portability is a big goal here.
There are alternatives to this approach:
struct Object
{
struct Class* class;
union Value
{
struct Class c;
int i;
struct String s;
} value;
};
The problem here is that the union uses up as much space as the size of the largest thing that can be stored in the union. Given that some of my types are many times as large as my other types, this would mean that my small types (int) would take up as much space as my large types (map) which is an unacceptable tradeoff.
struct Object
{
struct Class* class;
void* value;
};
This creates a level of redirection that will slow things down. Speed is a goal here.
The final alternative is to pass around void*s and manage the internals of the structure myself. For example, to implement the type test mentioned above:
void* foo = bar();
// increment foo
if(*((struct Class*) foo) == integer)
(*((int*)(foo + sizeof(struct Class*))))++;
else
handleTypeError();
This gives me everything I want (portability, different sizes for different types, etc.) but has at least two downsides:
Hideous, error-prone C. The code above only calculates a single-member offset; it will get much worse with types more complex than integers. I might be able to alleviate this a bit using macros, but this will be painful no matter what.
Since there is no struct that represents the object, I don't have the option of stack allocations (at least without implementing my own stack on the heap).
Basically, my question is, how can I get what I want without paying for it? Is there a way to be portable, have variance in size for different types, not use redirection, and keep my code pretty?
EDIT: This is the best response I've ever received for an SO question. Choosing an answer was hard. SO only allows me to choose one answer so I chose the one that lead me to my solution, but you all received upvotes.
See Python PEP 3123 (http://www.python.org/dev/peps/pep-3123/) for how Python solves this problem using standard C. The Python solution can be directly applied to your problem. Essentially you want to do this:
struct Object { struct Class* class; };
struct Integer { struct Object object; int value; };
struct String { struct Object object; size_t length; char* characters; };
You can safely cast Integer* to Object*, and Object* to Integer* if you know that your object is an integer.
C gives you sufficient guarantees that your first approach will work. The only modification you need to make is that in order to make the pointer aliasing OK, you must have a union in scope that contains all of the structs that you are casting between:
union allow_aliasing {
struct Class class;
struct Object object;
struct Integer integer;
struct String string;
};
(You don't need to ever use the union for anything - it just has to be in scope)
I believe the relevant part of the standard is this:
[#5] With one exception, if the value
of a member of a union object is used
when the most recent store to the
object was to a different member, the
behavior is implementation-defined.
One special guarantee is made in order
to simplify the use of unions: If a
union contains several structures that
share a common initial sequence (see
below), and if the union object
currently contains one of these
structures, it is permitted to inspect
the common initial part of any of them
anywhere that a declaration of the
completed type of the union is
visible. Two structures share a common
initial sequence if corresponding
members have compatible types (and,
for bit-fields, the same widths) for a
sequence of one or more initial
members.
(This doesn't directly say it's OK, but I believe that it does guarantee that if two structs have a common intial sequence and are put into a union together, they'll be laid out in memory the same way - it's certainly been idiomatic C for a long time to assume this, anyway).
There are 3 major approaches for implementing dynamic types and which one is best depends on the situation.
1) C-style inheritance: The first one is shown in Josh Haberman's answer. We create a type-hierarchy using classic C-style inheritance:
struct Object { struct Class* class; };
struct Integer { struct Object object; int value; };
struct String { struct Object object; size_t length; char* characters; };
Functions with dynamically typed arguments receive them as Object*, inspect the class member, and cast as appropriate. The cost to check the type is two pointer hops. The cost to get the underlying value is one pointer hop. In approaches like this one, objects are typically allocated on the heap since the size of objects is unknown at compile time. Since most `malloc implementations allocate a minimum of 32 bytes at a time, small objects can waste a significant amount of memory with this approach.
2) Tagged union: We can remove a level of indirection for accessing small objects using the "short string optimization"/"small object optimization":
struct Object {
struct Class* class;
union {
// fundamental C types or other small types of interest
bool as_bool;
int as_int;
// [...]
// object pointer for large types (or actual pointer values)
void* as_ptr;
};
};
Functions with dynamically typed arguments receive them as Object, inspect the class member, and read the union as appropriate. The cost to check the type is one pointer hop. If the type is one of the special small types, it is stored directly in the union, and there is no indirection to retrieve the value. Otherwise, one pointer hop is required to retrieve the value. This approach can sometimes avoid allocating objects on the heap. Although the exact size of an object still isn't known at compile time, we now know the size and alignment (our union) needed to accommodate small objects.
In these first two solutions, if we know all the possible types at compile time, we can encode the type using an integer type instead of a pointer and reduce type check indirection by one pointer hop.
3) Nan-boxing: Finally, there's nan-boxing where every object handle is only 64 bits.
double object;
Any value corresponding to a non-NaN double is understood to simply be a double. All other object handles are a NaN. There are actually large swaths of bit values of double precision floats that correspond to NaN in the commonly used IEEE-754 floating point standard. In the space of NaNs, we use a few bits to tag types and the remaining bits for data. By taking advantage of the fact that most 64-bit machines actually only have a 48-bit address space, we can even stash pointers in NaNs. This method incurs no indirection or extra memory use but constrains our small object types, is awkward, and in theory is not portable C.
Section 6.2.5 of ISO 9899:1999 (the C99 standard) says:
A structure type describes a sequentially allocated nonempty set of member objects (and, in certain circumstances, an incomplete array), each of which has an optionally specified name and possibly distinct type.
Section 6.7.2.1 also says:
As discussed in 6.2.5, a structure is a type consisting of a sequence of members, whose storage is allocated in an ordered sequence, and a union is a type consisting of a sequence of members whose storage overlap.
[...]
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
This guarantees what you need.
In the question you say:
The problem is that, as far as I know, the C standard makes no promises about how structures are stored. On my platform this works.
This will work on all platforms. It also means that your first alternative - what you are currently using - is safe enough.
But on another platform struct StringInteger might store value before class and when I accessed foo->class in the above I would actually be accessing foo->value, which is obviously bad. Portability is a big goal here.
No compliant compiler is allowed to do that. [I replaced String by Integer assuming you were referring to the first set of declarations. On closer examination, you might have been referring to the structure with an embedded union. The compiler still isn't allowed to reorder class and value.]
The problem is that, as far as I know, the C standard makes no promises about how structures are stored. On my platform this works. But on another platform struct String might store value before class and when I accessed foo->class in the above I would actually be accessing foo->value, which is obviously bad. Portability is a big goal here.
I believe you're wrong here. First, because your struct String doesn't have a value member. Second, because I believe C does guarantee the layout in memory of your struct's members. That's why the following are different sizes:
struct {
short a;
char b;
char c;
}
struct {
char a;
short b;
char c;
}
If C made no guarantees, then compilers would probably optimize both of those to be the same size. But it guarantees the internal layout of your structs, so the natural alignment rules kick in and make the second one larger than the first.
I appreciate the pedantic issues raised by this question and answers, but I just wanted to mention that CPython has used similar tricks "more or less forever" and it's been working for decades across a huge variety of C compilers. Specifically, see object.h, macros like PyObject_HEAD, structs like PyObject: all kinds of Python Objects (down at the C API level) are getting pointers to them forever cast back and forth to/from PyObject* with no harm done. It's been a while since I last played sea lawyer with an ISO C Standard, to the point that I don't have a copy handy (!), but I do believe that there are some constraints there that should make this keep working as it has for nearly 20 years...

Resources