I'm wondering if there's a better way to handling pointers to multiple structs, when you use the structs to overlay data you didn't create yourself. I'm trying to parse an ELF file header (the format is well-known, so I won't reproduce the structs here). So, say you have a pointer to a struct for the file header:
struct elf64_file_hdr *fh;
One of the fields of the file header is `shoff', which is the offset in bytes from the start of the file to the beginning of section headers, or to be more specific, the beginning of an array of section header structs. So, you can have:
struct elf64_section_header *sh;
and access each section header as sh[0], sh[1], etc. The question, then, is how to set `sh' correctly. I've been doing a cast and then recast to make the pointer math work:
sh = (struct elf64_sec_hdr *)((char *)fh + fh->fh_shoff);
But it seems like there must be a more elegant way to do this.
I don't think there is, beyond wrapping it in a macro. But that doesn't seem too bad. To get around pointer arithmetic you first have to cast away the initial type, then you need to cast the result to the type you want. That's exactly what you're doing.
Here's a macro to do it, casting the result to void* so it will automatically convert to the right type.
#define AT_OFFSET(base, offset) ((void*)((char*)(base) + (offset)))
...
struct elf64_sec_hdr *sh = AT_OFFSET(fh, fh->fh_shoff);
Related
I have been writing C for a decent amount of time, and obviously am aware that C does not have any support for explicit private and public fields within structs. However, I (believe) I have found a relatively clean method of implementing this without the use of any macros or voodoo, and I am looking to gain more insight into possible issues I may have overlooked.
The folder structure isn't all that important here but I'll list it anyway because it gives clarity as to the import names (and is also what CLion generates for me).
- example-project
- cmake-build-debug
- example-lib-name
- include
- example-lib-name
- example-header-file.h
- src
- example-lib-name
- example-source-file.c
- CMakeLists.txt
- CMakeLists.txt
- main.c
Let's say that example-header-file.h contains:
typedef struct ExampleStruct {
int data;
} ExampleStruct;
ExampleStruct* new_example_struct(int, double);
which just contains a definition for a struct and a function that returns a pointer to an ExampleStruct.
Obviously, now if I import ExampleStruct into another file, such as main.c, I will be able to create and return a pointer to an ExampleStruct by calling
ExampleStruct* new_struct = new_example_struct(<int>, <double>);,
and will be able to access the data property like: new_struct->data.
However, what if I also want private properties in this struct. For example, if I am creating a data structure, I don't want it to be easy to modify the internals of it. I.e. if I've implemented a vector struct with a length property that describes the current number of elements in the vector, I wouldn't want for people to just be able to change that value easily.
So, back to our example struct, let's assume we also want a double field in the struct, that describes some part of internal state that we want to make 'private'.
In our implementation file (example-source-file.c), let's say we have the following code:
#include <stdlib.h>
#include <stdbool.h>
typedef struct ExampleStruct {
int data;
double val;
} ExampleStruct;
ExampleStruct* new_example_struct(int data, double val) {
ExampleStruct* new_example_struct = malloc(sizeof(ExampleStruct));
example_struct->data=data;
example_struct->val=val;
return new_example_struct;
}
double get_val(ExampleStruct* e) {
return e->val;
}
This file simply implements that constructor method for getting a new pointer to an ExampleStruct that was defined in the header file. However, this file also defines its own version of ExampleStruct, that has a new member field not present in the header file's definition: double val, as well as a getter which gets that value. Now, if I import the same header file into main.c, which contains:
#include <stdio.h>
#include "example-lib-name/example-header-file.h"
int main() {
printf("Hello, World!\n");
ExampleStruct* test = new_example(6, 7.2);
printf("%d\n", test->data); // <-- THIS WORKS
double x = get_val(test); // <-- THIS AND THE LINE BELOW ALSO WORK
printf("%f\n", x); //
// printf("%f\n", test->val); <-- WOULD THROW ERROR `val not present on struct!`
return 0;
}
I tested this a couple times with some different fields and have come to the conclusion that modifying this 'private' field, val, or even accessing it without the getter, would be very difficult without using pointer arithmetic dark magic, and that is the whole point.
Some things I see that may be cause for concern:
This may make code less readable in the eyes of some, but my IDE has arrow buttons that take me to and from the definition and the implementation, and even without that, a one line comment would provide more than enough documentation to point someone in the direction of where the file is.
Questions I'd like answers on:
Are there significant performance penalties I may suffer as a result of writing code this way?
Am I overlooking something that may make this whole ordeal pointless, i.e. is there a simpler way to do this or is this explicitly discouraged, and if so, what are the objective reasons behind it.
Aside: I am not trying to make C into C++, and generally favor the way C does things, but sometimes I really want some encapsulation of data.
Am I overlooking something that may make this whole ordeal pointless, i.e. is there a simpler way to do this or is this explicitly discouraged, and if so, what are the objective reasons behind it.
Yes: your approach produces undefined behavior.
C requires that
All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.
(C17 6.2.7/2)
and that
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
[...]
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a
subaggregate or contained union), or
a character type.
(C17 6.5/7, a.k.a. the "Strict Aliasing Rule")
Your two definitions of struct ExampleStruct define incompatible types because they specify different numbers of members (see C17 6.2.7/1 for more details on structure type compatibility). You will definitely have problems if you pass instances by value between functions relying on different of these incompatible definitions. You will have trouble if you construct arrays of them, whether dynamically, automatically, or statically, and attempt to use those across boundaries between TUs using one definition and those using another. You may have problems even if you do none of the above, because the compiler may behave unexpectedly, especially when optimizing. DO NOT DO THIS.
Other alternatives:
Opaque pointers. This means you do not provide any definition of struct ExampleStruct in those TUs where you want to hide any of its members. That does not prevent declaring and using pointers to such a structure, but it does prevent accessing any members, declaring new instances, or passing or receiving instances by value. Where member access is needed from TUs that do not have the structure definition, it would need to be mediated by accessor functions.
Just don't access the "private" members. Do not document them in the public documentation, and if you like, explicity mark them (in code comments, for example) as reserved. This approach will be familiar to many C programmers, as it is used a lot for structures declared in POSIX system headers.
As long as the public has a complete definition for ExampleStruct, it can make code like:
ExampleStruct a = *new_example_struct(42, 1.234);
Then the below will certainly fail.
printf("%g\n", get_val(&a));
I recommend instead to create an opaque pointer and provide access public functions to the info in .data and .val.
Think of how we use FILE. FILE *f = fopen(...) and then fread(..., f), fseek(f, ...), ftell(f) and eventually fclose(f). I suggest this model instead. (Even if in some implementations FILE* is not opaque.)
Are there significant performance penalties I may suffer as a result of writing code this way?
Probably:
Heap allocation is expensive, and - today - usually not optimized away even when that is theoretically possible.
Dereferencing a pointer for member access is expensive; although this might get optimized away with link-time-optimization... if you're lucky.
i.e. is there a simpler way to do this
Well, you could use a slack array of the same size as your private fields, and then you wouldn't need to go through pointers all the time:
#define EXAMPLE_STRUCT_PRIVATE_DATA_SIZE sizeof(double)
typedef struct ExampleStruct {
int data;
_Alignas(max_align_t) private_data[EXAMPLE_STRUCT_PRIVATE_DATA_SIZE];
} ExampleStruct;
This is basically a type-erasure of the private data without hiding the fact that it exists. Now, it's true that someone can overwrite the contents of this array, but it's kind of useless to do it intentionally when you "don't know" what the data means. Also, the private data in the "real" definition will need to have the same, maximal, _AlignAs() as well (if you want the private data not to need to use AlignAs(), you will need to use the real alignment quantum for the type-erased version).
The above is C11. You can sort of do about the same thing by typedef'ing max_align_t yourself, then using an array of max_align_t elements for private data, with an appropriate length to cover the actual size of the private data.
An example of the use of such an approach can be found in CUDA's driver API:
Parameters for copying a 3D array: CUDA_MEMCPY3D vs
Parameters for copying a 3D array between two GPU devices: CUDA_MEMCPY3D_peer
The first structure has a pair of reserved void* fields, hiding the fact that it's really the second structure. They could have used an unsigned char array, but it so happens that the private fields are pointer-sized, and void* is also kind of opaque.
This causes undefined behaviour, as detailed in the other answers. The usual way around this is to make a nested struct.
In example.h, one defines the public-facing elements. struct example is not meant to be instantiated; in a sense, it is abstract. Only pointers that are obtained from one of it's (in this case, the) constructor are valid.
struct example { int data; };
struct example *new_example(int, double);
double example_val(struct example *e);
and in example.c, instead of re-defining struct example, one has a nested struct private_example. (Such that they are related by composite aggregation.)
#include <stdlib.h>
#include "example.h"
struct private_example {
struct example public;
double val;
};
struct example *new_example(int data, double val) {
struct private_example *const example = malloc(sizeof *example);
if(!example) return 0;
example->public.data = data;
example->val = val;
return &example->public;
}
/** This is a poor version of `container_of`. */
static struct private_example *example_upcast(struct example *example) {
return (struct private_example *)(void *)
((char *)example - offsetof(struct private_example, public));
}
double example_val(struct example *e) {
return example_upcast(e)->val;
}
Then one can use the object as in main.c. This is used frequently in linux kernel code for container abstraction. Note that offsetof(struct private_example, public) is zero, ergo example_upcast does nothing and a cast is sufficient: ((struct private_example *)e)->val. If one builds structures in a way that always allows casting, one is limited by single inheritance.
I have a question about some code in Eric Roberts' Programming Abstractions in C. He use several libraries of his own both to simplify things for readers and to teach how to write libraries. (All of the library code for the book can be found on this site.)
One library, genlib provides a macro for generic allocation of a pointer to a struct type. I don't understand part of the macro. I'll copy the code below, plus an example of how it is meant to be used, then I'll explain my question in more detail.
/*
* Macro: New
* Usage: p = New(pointer-type);
* -----------------------------
* The New pseudofunction allocates enough space to hold an
* object of the type to which pointer-type points and returns
* a pointer to the newly allocated pointer. Note that
* "New" is different from the "new" operator used in C++;
* the former takes a pointer type and the latter takes the
* target type.
*/
#define New(type) ((type) GetBlock(sizeof *((type) NULL)))
/* GetBlock is a wrapper for malloc. It encasulates the
* common sequence of malloc, check for NULL, return or
* error out, depending on the NULL check. I'm not going
* to copy that code since I'm pretty sure it isn't
* relevant to my question. It can be found here though:
* ftp://ftp.awl.com/cseng/authors/roberts/cs1-c/standard/genlib.c
*/
Roberts intends for the code to be used as follows:
typedef struct {
string name;
/* etc. */
} *employeeT;
employeeT emp;
emp = New(employeeT);
He prefers to use a pointer to the record as the type name, rather than the record itself. So New provides a generic way to allocate such struct records.
In the macro New, what I don't understand is this: sizeof *((type)) NULL). If I'm reading that correctly, it says "take the size of the dereferenced cast of NULL to whatever struct type type represents in a given call". I think I understand the dereferencing: we want to allocate enough space for the struct; the size of the pointer is not what we need, so we dereference to get at the size of the underlying record-type. But I don't understand the idea of casting NULL to a type.
My questions:
You can cast NULL? What does that even mean?
Why is the cast necessary? When I tried removing it, the compiler says error: expected expression. So, sizeof *(type) is not an expression? That confused me since I can do the following to get the sizes of arbitrary pointers-to-structs:
#define struct_size(s_ptr) do { \
printf("sizeof dereferenced pointer to struct %s: %lu\n", \
#s_ptr, sizeof *(s_ptr)); \
} while(0)
Edit: As many people point out below, the two examples aren't the same:
/* How genlib uses the macro. */
New(struct MyStruct*)
/* How I was using my macro. */
struct MyStruct *ptr; New(ptr)
For the record, this isn't homework. I'm an amateur trying to improve at C. Also, there's no problem with the code, as far as I can tell. That is, I'm not asking how I can do something different with it. I'm just trying to better understand (1) how it works and (2) why it must be written the way it is. Thanks.
The issue is that the macro needs to get the size of the type pointed at by the pointer type.
As an example, suppose that you have the the pointer type struct MyStruct*. Without removing the star from this expression, how would you get the size of struct MyStruct? You couldn't write
sizeof(*(struct MyStruct*))
since that's not legal C code.
On the other hand, if you had a variable of type struct MyStruct*, you could do something like this:
struct MyStruct* uselessPointer;
sizeof(*uselessPointer);
Since sizeof doesn't actually evaluate its argument (it just determines the static size of the type of the expression), this is safe.
Of course, in a macro, you can't define a new variable. However, you could make up a random pointer to a struct MyStruct* by casting an existing pointer. Here, NULL is a good candidate - it's an existing pointer that you can legally cast to a struct MyStruct*. Therefore, if you were to write
sizeof(* ((struct MyStruct*)NULL))
the code would
Cast NULL to a struct MyStruct*, yielding a pointer of static type struct MyStruct*.
Determine the size of the object that would be formed by dereferencing the pointer. Since the pointer has type struct MyStruct*, it points at an object of type struct MyStruct, so this yields the type of struct MyStruct.
In other words, it's a simple way to get an object of the pointer type so that you can dereference it and obtain an object of the underlying type.
I've worked with Eric on some other macros and he is a real pro with the preprocessor. I'm not surprised that this works, and I'm not surprised that it's tricky, but it certainly is clever!
As a note - in C++, this sort of trick used to be common until the introduction of the declval utility type, which is a less-hacky version of this operation.
Hope this helps!
It's a hack. It relies on the fact that the argument to the sizeof operator isn't actually evaluated.
To answer your specific questions:
Yes, NULL is just a pointer literal. Like any other pointer, it may be cast.
sizeof operates on either a type or an expression. *(type) would be neither (after macro substitution has occurred), it would be a syntax error.
Can someone please help me understand what this is doing:
alt_up_sd_card_dev *dev = (alt_up_sd_card_dev *) alt_find_dev(name, &alt_dev_list);
if (dev != NULL)
{
aux_status_register = ((short int *) SD_CARD_AUX_STATUS(dev->base));
}
I understand that the (short int *) is "type-casting" (as explained to me by some other helpful people on this forum) what SD_CARD_AUX_STATUS should be when the contents are called, but I've never seen the dev->base syntax before....
1.Here dev, is structure pointer. This pointer gets the memory from this line (alt_up_sd_card_dev *) alt_find_dev(name, &alt_dev_list);
2.The structure alt_up_sd_card_dev may have member called base.
3.SD_CARD_AUX_STATUS could be macro, which does some manipulation on dev->base pointer.
For more information, check the parameterised MACRO concepts in C
We can't give you a proper answer without knowing all the include files this references, but by general convention:
1) By standard C naming conventions, the all-uppercase SD_CARD_AUX_STATUS() is a macro rather than a function. The macro is set up by a #define either earlier in this file or in one of the #included .h files. Look for that definition to find out what it's actually doing.
2) -> is like . but for pointers-to-structures rather than structures. That is, if you have a struct { int foo, bar; } baz, then baz.foo is the same thing as (&baz)->foo Or, as Wikipedia puts it:
Structure dereference ("member b of object pointed to by a") a->b
Structure reference ("member b of object a") a.b
This is not related to C syntax in general. This piece of code is very specific.
I can only guess what it does.
alt_up_sd_card_dev *dev = (alt_up_sd_card_dev *) alt_find_dev(name, &alt_dev_list);
This calls a function alt_find_dev which probably looks for a device. The device is apparently an sd card reader... The result of the function is cast to a specific type of pointer. Probably the result is of generic pointer type and it is cast to a pointer to a structure that describes specifically an sd card device. It is then stored in the dev variable.
if (dev != NULL)
if the device is found....
aux_status_register = ((short int *) SD_CARD_AUX_STATUS(dev->base));
a macro SD_CARD_AUX_STATUS is called with a parameter dev->base (where base is a field in the structure describing the sd card device). The operator -> is called pointer dereference and it is simillar to . operator. It allows to access fields of a struct which is pointed by the pointer. The macro returns some kind of a status of the device. Hard to tell why it is cast to a pointer to short int, but the result is stored in a variable aux_status_register.
Without additional information it's impossible to tell anything more about the code.
dev is a pointer to a data structure in memory - called a struct in C. A struct has members - a set of variables within it. dev->base means access the member called base within the struct of type alt_up_sd_card_dev which dev is pointing to.
Look for the definition of struct alt_up_sd_card_dev which you will find in one of the header files included from the one you are looking at.
In general the -> operator is said to de-reference the pointer.
SD_CARD_AUX_STATUS is probably a macro - traditionally these are named all upper case. It might perform some kind of conversion or even call a function. Search for its definition in the headers.
I need help figuring out the correct data type for an assignment from a function call please.
I'm trying to get at the data in the content field of N_Vector u. Here's what the documentation says about N_Vector:
The type N_Vector is defined as
N_Vector u;
tpedef struct _generic_N_Vector *N_Vector;
struct _generic_N_Vector {
void *content;
struct _generic_N_Vector_Ops *ops;
};
...
[The parallel NVECTOR module] defines the content field of N_Vector
to be a structure containing global and local lengths, a pointer to the
beginning of contiguous local data array, MPI communicator and flag.
struct _N_VectorContent_Parallel {
long int local_length;
long int global_length;
booleantype own_data;
realtype *data;
MPI_Comm comm;
}
So I guess that means that content in _generic_N_Vector "points to" a structure of type _N_VectorContent_Parallel (right?).
Then I try to use a macro for accessing content. Here's the documentation for NV_CONTENT_P.
v_cont=NV_CONTENT_P(v) sets v_cont to be a pointer to the N_Vector content
structure of type struct _N_VectorParallelContent.
Notice the different name of the struct!
What does that mean? What type do I declare v_cont to be?
I tried
N_Vector u;
...
_N_VectorParallelContent *v_cont1;
_N_VectorContent_Parallel *v_cont2;
v_cont1 = NV_CONTENT_P(u);
v_cont2 = NV_CONTENT_P(u);
but these declarations got the error "'_N_VectorContent_Parallel' undeclared..." or "'_N_VectorParallelContent' undeclared...".
But it seems that these structures must be delcared already. I successfully declared (and used) u, of type N_Vector. And the docs seem to say that N_Vector contains one of those two structures (or maybe both).
So why the error message? What is the correct data type to declare for v_cont to receive data from NV_CONTENT_P?
I know this is a long, detailed question, but I don't understand enough to whittle it down any more.
Thanks for your help.
I'm not familiar with this particular library, but it looks to me like the documentation is a little inconsistent.
Right after the blurb about NV_CONTENT_P(v), it says NV_CONTENT_P(v) is defined as:
#define NV_CONTENT_P(v) ( (N_VectorContent_Parallel)(v->content) )
So that version of the name is probably correct. I can't see a definition for N_VectorContent_Parallel on that page, but it's probably defined somewhere as something like struct _N_VectorContent_Parallel*. So, you can probably do:
N_VectorContent_Parallel v_cont1 = NV_CONTENT_P(u);
Remember that for structs, struct is part of the type name. This means that you're getting errors in your example because you haven't included struct:
// this is an unknown type
_N_VectorParallelContent *v_cont1;
// this is a "struct _N_VectorParallelContent"
struct _N_VectorParallelContent *v_cont1;
// But use this one, as it follows the macro
N_VectorContent_Parallel v_cont1;
If you want to see exactly what the preprocessor has done to your code, you can use gcc's -E flag.
-E Stop after the preprocessing stage; do not run the compiler proper.
The output is in the form of preprocessed source code, which is sent to
the standard output.
Input files which don't require preprocessing are ignored.
This is especially useful for seeing the results of macros and multiple complex header files.
Edit: From the source you've linked:
typedef struct _N_VectorContent_Parallel *N_VectorContent_Parallel;
This is a type definition that says that N_VectorContent_Parallel is the same as a struct _N_VectorContent_Parallel * (a pointer to a struct _N_VectorContent_Parallel), which means you can access v_cont1 using the -> syntax:
N_VectorContent_Parallel v_cont1;
printf("%d",v_cont1->local_length);
a->b is is shorthand for (*a).b - it's just a cleaner-looking way of writing the dereference needed to accessing a member of a struct through a pointer to that struct. If that seems confusing, see my answer to this question.
Personally, I don't like typedefs that hide pointers like this one, because it's hard to tell by looking at the code whether you need to use a.b or a->b.
I have code in my header file that looks like:
typedef struct _bn bnode;
I can do
bnode b;
just fine, but
b[i], where i is an int gives me the following error:
invalid use of undefined type ‘struct _bn’
Any ideas?
As stated, b is not an array and, as such, can not be accessed as one.
Also, how do you expect the compiler to figure out the size of that structure? When you do something like bnode b[i] a certain amount of space is to be set aside for later use. As you have it there no size.
What is your opacity intended to do for you? Maybe if you explain further what you are trying to accomplish you will get a more revealing answer...
As far as an API/library goes, normally if you're going to need an opaque structure, you don't allow the user of the API to declare things like arrays or static instances because of this. Not knowing anything about the structure is the name of the game so you're probably going to have to define some functions to manipulate them. Most C libraries that declare opaque structures often has accessor and modification functions.
One example is from Lua (obviously a Lua state is an single use structure but it's the idea):
typedef struct lua_State lua_State;
void lua_pushnumber(lua_State *s, lua_Number n);
In this case, if you decided you needed multiple Lua states, you would do something like the following:
lua_State *states[5];
for(int i = 0; i < 5; i++)
states[i] = lua_open();
I think the general rule-of-thumb is that if you're working with opaque structures, you're going to be working through pointers only, which is pretty much the only way to go about it anyway.
Sounds like you either want an opaque pointer/PIMPL implementation, or you should include the appropriate header file.
Structs in C++ are almost identical to classes, so the same techniques apply.
You can't define an array of opaque structs. If you do you get an error such as:
error: array type has incomplete element type
(the specific error text will vary; the one above is from gcc 4.4.1).
But what you can do is create an array of pointers to opaque structs. This is doable as the details of the struct do not affect the size of the pointer.
typedef struct _bn bnode;
bnode *b[20];
You have to at least know the size of bnode to be able to make an array of them.
You could do, in your opaque definition of bnode:
typedef struct bnode_struct {
uint8_t opaque_bytes[1024]; /* magically just "know" how big it is. */
} bnode;
Then you can do:
bnode b[10];
and it will work.