enumerating over a structure fields in C - c

I have a several structs in C and I want to write the following three functions:
get_field_list(...)
get_value_by_name(...)
set_value_by_name(...)
The first should return the list of fields defined in the struct. The second and third should get and set to the appropriate field by it's name.
I'm writing the structs. I'm willing to use any macro magic if required. It's OK if ill have a triplet of functions per each struct, but generic structures are better. Function pointers are also fine...
Basically I want some elementary reflections for structs....
Relevent:
https://natecraun.net/articles/struct-iteration-through-abuse-of-the-c-preprocessor.html
motivation
I'm trying to build a DAL (Data Access Layer) for a native app written in C. I'm using SQLite as a DB. I need to store various structures, and to be able to insert\ update\ get(select by key)\ search (select by query), and also to create\ drop the required table.
Basicly I want something like Hibernate for C ...
My best idea so far is to use MACROs, or some code generation utility, or a script, to create my structs together with meta-data I could use to dynamically build all my SQL commands. And also to have a small 'generic' module to implement all the basic procedures i need...
Different or better ideas to solve my actual problem will also be appreciated!

It can be done with "macro magic" as you suggested:
For each struct, create a header file (mystruct-fields.h) like this:
FIELD(int, field1)
FIELD(int*, field2)
FIELD(char*, string1)
Then, in another header (mystruct.h) you include that as many times as you need:
#define FIELD(T,N) T N;
struct mystruct {
#include "mystruct-fields.h"
};
#undef FIELD
#define FIELD(T,N) { STRINGIFY(T), STRINGIFY(N), offsetof(mystruct, N) },
#define STRINGIFY1(S) #S
#define STRINGIFY(S) STRINGIFY1(S)
struct mystruct_table {
struct {
const char *type, *name;
size_t offset;
} field[];
} table = {
#include "mystruct-fields.h"
{NULL, NULL, 0}
};
#undef FIELD
You can then implement your reflection functions, using the table, however you choose.
It might be possible, using another layer of header file includes, to reuse the above code for any struct without rewriting it, so your top-level code might only have to say something like:
#define STRUCT_NAME mystruct
#include "reflectable-struct.h"
#undef STRUCT_NAME
Honestly though, it's easier for the people who come after you if you just write the struct normally, and then write out the table by hand; it's much easier to read, your IDE will be able to auto-complete your types, and prominent warnings in the comments should help prevent people breaking it in future (and anyway, you do have tests for this right?)

The way to do it is to have your struct in a database format or xml or a text file or whatever format you are comfortable with. And use a C program to write a .h file for each struct. The .h file contains the struct , an enum of the fields, and array of char containing the names of each field. From there you can build anything you need. Preferably using a program generator.

Take a look at Metaresc library. It provides reflection capabilities in plain C. Metadata of types definition could be derived either from custom macro language that replaces standard C type definition semantics or from compiler debug info. Sample app is provided in README.md

Related

#define in C, legal character

There is a C structure
struct a
{
int val1,val2;
}
I have made changes to the code like
struct b
{
int val2;
}
struct a
{
int val1;
struct b b_obj;
}
Now, usage of val2 in the other C files is like a_obj->val2;.
I want to replace its declaration usage and there are a lot of them, so I have defined a macro in the header file where the struct a is defined as follows:
#define a_obj->val2 (a_obj->b_obj.val2)
It's not working. Is -> illegal in the identifier part of a macro definition #define?
Could someone please tell me where am I wrong?
Edit as suggested by #Basile -
It's a legacy source code, a very huge project. Not sure of LOC.
I want to make such changes because I want to make it more modular.
For example I want to group similar fields of the structure under a same name and that's the reason I want to create another struct B with fields which are related to B feature and also common to A.
I can't use Find Replace feature of other text editors, I am using VIM.
This kind of macro magic will get you into trouble soon,
because it is making your source code unreadable and brittle (credits Basile for the phrasing).
But this should work for what you describe.
struct b
{
int val2m;
}
struct a
{
int val1;
struct b b_obj;
}
#define val2 b_obj.val2m
The trick is to give the actual identifier inside the struct declaration a new name (val2m), so that the name all the other code uses can be turned into a magic alias,
which then can contain the modified access to take a detour via the additionally introduced inner struct.
This is only a kind of band-aid for the problematic situation of having to change something backstage in existing code with many references. Only use it if there is no chance of refactoring the code cleanly. ("band-aid", appropriate image by StoryTeller, credits).
I explicitly recommend looking at Basiles answer, for a cleaner more "future-proof" way. It is the way to go to avoid the trouble I predict with using this macro magic. Use it if you are not forced by very good reasons.
As other explained, the preprocessor works only on tokens, and you can only #define a name. Read the documentation of cpp and the C11 standard n1570.
What you want to do is very ugly (and there are few occasions where it is worthwhile). It makes your code messy, unreadable, and brittle.
Learn to use better your source code editor (you probably have some interactive replace, or interactive replace with regexp-s; if you don't, switch to a better editor like GNU emacs or vim - and study the documentation of your editor). You could also use scripting tools like ed, sed, grep, awk etc... to help you in doing those replacements.
In a small project, replacing relevant occurrences of ->val2 (or .val2) with ->b_obj.val2 (or .b_obj.val2) is really easy, even if you have a hundred of them. And that keeps your code readable. Don't forget to use some version control system (to keep both old and new versions of your code).
In a large project of at least a million of lines of source code, you might ask how to find every occurrence of field usage of val2 for a given type (but you should probably name val2 well enough to have most occurrences of it be relevant; in other words, take care of the naming of your fields). That is a very different question (e.g. you could write some GCC plugin to find such occurrences and help you in replacing the relevant ones).
If you are refactoring an old and large legacy code, you need to be sure to keep it readable, and you don't want fancy macro tricks. For example, you might add some static inline function to access that field. And it could be then worthwhile to use some better tools (e.g. a compiler plugin, some kind of C parser, etc...) to help you in that refactoring.
Keep the source code readable by human developers. Otherwise, you are shooting yourself in the foot. What you want to do is unreasonable, it decreases the readability of the code base.
I can't use Find Replace feature of other text editors, I am using VIM.
vim is scriptable (e.g. in lua) and accepts plugins (so if interactive replace is not enough, consider writing some vim plugin or script to help you), and has powerful find-replace-regexp facilities. You might also use some combination of scripts to help you. In many cases they are enough. If they are not, you should explain why.
Also, you could temporarily replace the val2 field of struct a with a unique name like val2_3TYRxW1PuK7 (or whatever is appropriate, making some unique "random-looking" name is easy). Then you run your full build (e.g. after some make clean). The compiler would emit error messages for every place where you need to replace val2 used as a field of struct a (but won't mind for any other occurrence of the val2 name used for some other purpose). That could help you a lot -once you have corrected your code to get rid of all errors- (especially when combined with some editor scripting) because then you just need to replace val2_3TYRxW1PuK7 with b_obj.val2 everywhere.
Is -> illegal in #define?
Yes.
#define identifier can only be letter, number or underscore.
Macros definitions must be regular identifiers, so you can't use any special character like - or >.
I've thinked that may be you can use an union, like this:
struct b
{
int val2;
}
struct a
{
int val1;
union {
struct b b_obj;
int val2;
}
}
so you can still using a_obj->val2.

Design issue in C

I'm struggling with a design issue, and I'm trying to find the "Best Practice" answer for my situation.
Say I have a file called Logger.c (And Logger.h) that is responsible for logging actions in my program.
I want logger to be referenced by all my modules, so each module's has a
#include Logger.h.
Say I have a module called NTFS.c that is responsible for interaction with the NTFS FS, This module has special structs that are defined in its header, for example: NTFS_Partition.
Here is the problem:
On one hand, I want logger to be able to print to a log file a formatted representation of NTFS_Partition, and by that I must #include NTFS.h in Logger.h.
(Inside Logger.h)
#include NTFS_Partition
VOID Log_Partition(NTFS_Partition *part);
On the other hand, I am not sure Logger should re-reference modules that reference him.
Currently I'm seeing a two main choices:
1.Logger.h includes NTFS.h, and NTFS.c include Logger.h (This works)
2.I create a new header file called NTFS_Types.h that would be shared accross all the
modules, and would only contain the deceleration of NTFS structs (like NTFS_Partition).
Thanks a lot,
Michael.
You can create a shared header where all your structs are defined.
// structs.h
struct NTFS_Partition { .. };
struct FAT32_Partition { .. };
struct FAT16_Partition { .. };
Include it in logger.h.
// logger.h
#include "structs.h"
VOID Log_Partition(NTFS_Partition *part);
VOID Log_Partition(FAT32_Partition *part);
VOID Log_Partition(FAT16_Partition *part);
And include the logger.h in various source files.
// NTFS.c
#include "logger.h"
// FAT32.c
#include "logger.h"
// FAT16.c
#include "logger.h"
In C++, it's better to keep different irrelevant class definitions in different header files. But in C, placing different struct definitions in separate headers is probably an overkill.
It isn't entirely clear whether you are coding in C or C++; I'm going to assume C (so no overloaded function names, etc). It seems to me that you need to 'forward declare' your structures. In Logger.h, you write:
#ifndef LOGGER_H_INCLUDED
#define LOGGER_H_INCLUDED
struct NTFS_Partition; // No details - just the name (3 times)
struct FAT16_Partition;
struct FAT32_Partition;
...
void Log_NTFS_Partition(struct NTFS_Partition *part);
void Log_FAT16_Partition(struct FAT16_Partition *part);
void Log_FAT32_Partition(struct FAT32_Partition *part);
#endif // LOGGER_H_INCLUDED
This is all the information that a general client (of Logger.h) needs to know.
If a specific client is dealing with NTFS partitions, then it will not only include Logger.h but also NTFS.h, which will provide the full definition of struct NTFS_Partition { ... };, so the client can create instances of the structure and populate it with data. The code that implements the logging, Logger.c, will also include Logger.h and NTFS.h (and FAT16.h and FAT32.h), of course, so that it too can reference the members of the structures.
The header for a service (such as Logger.h) should provide the minimal amount of information that the clients of the service need for compilation. The implementation file may need more information, but can collect the extra information from headers that provide it.
One advantage of using the struct tag notation is precisely that it can be repeated as often as necessary without messing anything up. If you don't have C11, you can't repeat a typedef, so if you write:
typedef struct NTFS_Partition NTFS_Partition;
you must only include that line once. The difficulty is making sure that it is only defined once. For that, you probably use a header such as FSTypes.h to define the file-system typedefs that is properly protected by header guards and is included in any file that needs any of the typedefs. You can then reference the types without the preceding struct keyword.
If you code in C++, the typedef isn't necessary; struct NTFS_Partition; declares that there is such a structure type and also declares NTFS_Partition as a name for that type. If your code is bilingual, use the typedef version; it works in both C and C++.
Note that if your functions such as Log_NTFS_Partition() take an actual structure instead of a pointer to a structure, then you have to have the definition of the structure in scope. If the functions only take pointers, though, a forward declaration is sufficient.

Why should structure names have a typedef?

I have seen source codes always having a typedef for a structure and using the same everywhere instead of using the structure name as "struct sname" etc directly?
What is the reason behind this? Are there any advantages in doing this?
Its easier to read Box b; than struct boxtype b;
typedef struct _entry{
char *name;
int id;
} Entry, *EntryP;
Advantage:
In the above typedef, both Entry & EntryP are defined apart from struct _entry.
So, EntryP firstentry can be used in place of struct _entry *firstentry and is a little simpler to parse in mind.
Note: Its not like structure names should be typedefined, but evidently its easier to read. Also, use of Entry * vs EntryP is totally user-dependent.
It is an odd quirk in the C language, structure tag names are stored in a different namespace (symbol table). The C++ language got rid of this behavior. The typedef creates an alias for the structure type in the global namespace. So you don't have to type "struct" before the structure name.
Not sure what Ritchie was smoking when he defined this behavior. I'm guessing at some kind of problem with the parser in an early version of the compiler.
I like to do this in C:
// in a "public" header file
typedef struct Example Example;
// in a "private" header or a source file
struct Example { ... };
Like this, I have Example as an opaque type (i.e., I can only pass pointers to it about) throughout my code, except for the code that implements its operations, which can see what it is really defined as. (You could use a separate name for the opaque type but there's no real advantage to that.)
Its just for the code readability. typedefs are just to give new name for the data type. Instead of giving int every where you can name it the way you want and wherever you use int you can just replace it by the new name you have given. Same thing applies for structures also.
It's a form of information hiding and useful when creating opaque types. See this SO link for a slightly longer answer.
A non-typedefed struct name in C requires "struct" to be prepended everywhere the type name is used, so most people use typedef to create a new name for the type that does not need to have the struct keyword prepended all the time.
The reasons for doing this are code readability, reduced typing, and probably clarity as well, but typedefs can actually obscure information about pointer types.
In all honesty the need to typedef to create new names for structs is a relic, and it's a shame that C99 didn't follow C++'s lead and remove it.
I actually don't use typedef for structs.
Why?
Because I would use some convention such a s_ prefix anyway, to easily see what kind of variable I'm looking at.
Since I use a lot of abstract data types, I guess it would be a good idea to use a typedef, so that the user really uses it as an opaque type.
But in practice I always use a struct in the implementation anyway.

structure definition conflict between XS module and perl build

On OpenSolaris ($^O eq 'solaris', vers. 2.11), I'm trying to build an XS module which uses the XPGv4v2/Single Unix Spec. understanding of struct msghdr, specifically for "ancillary data" interrogation.
However, the native perl (v5.8.4) was built without the requisite defines, and so the struct msghdr visible within my XS file is the older, BSD kind::
#include "EXTERN.h"
#include "perl.h" /* older, "msg_accrights"-style msghdr now visible */
#include "XSUB.h"
....
struct msghdr m;
m.msg_control = buf; /* ERROR, structure has no member named "msg_control" */
....
Supplying the "right" #defines (_XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED) breaks the build, since it changes a great many things that perl was expecting.
Is there an elegant way I can have the XS module use the structure definition I'd like?
You either have to use the definitions that your existing perl understands, or compile a new perl with the definitions that you want.
You don't need to replace the existing perl, though. You can install the new perl separately so they don't conflict.
If you want it both ways, you have to figure out which definitions your Perl has and write code that handles the right set of definitions. You might add a layer of abstraction so you can implement the underlying bits with either set of definitions. It's a lot of repeated code probably, but that's what portability is, unfortunately. :(

C library naming conventions

Introduction
Hello folks, I recently learned to program in C! (This was a huge step for me, since C++ was the first language, I had contact with and scared me off for nearly 10 years.) Coming from a mostly OO background (Java + C#), this was a very nice paradigm shift.
I love C. It's such a beautiful language. What surprised me the most, is the high grade of modularity and code reusability C supports - of course it's not as high as in a OO-language, but still far beyond my expectations for an imperative language.
Question
How do I prevent naming conflicts between the client code and my C library code? In Java there are packages, in C# there are namespaces. Imagine I write a C library, which offers the operation "add". It is very likely, that the client already uses an operation called like that - what do I do?
I'm especially looking for a client friendly solution. For example, I wouldn't like to prefix all my api operations like "myuniquelibname_add" at all. What are the common solutions to this in the C world? Do you put all api operations in a struct, so the client can choose its own prefix?
I'm very looking forward to the insights I get through your answers!
EDIT (modified question)
Dear Answerers, thank You for Your answers! I now see, that prefixes are the only way to safely avoid naming conflicts. So, I would like to modifiy my question: What possibilities do I have, to let the client choose his own prefix?
The answer Unwind posted, is one way. It doesn't use prefixes in the normal sense, but one has to prefix every api call by "api->". What further solutions are there (like using a #define for example)?
EDIT 2 (status update)
It all boils down to one of two approaches:
Using a struct
Using #define (note: There are many ways, how one can use #define to achieve, what I desire)
I will not accept any answer, because I think that there is no correct answer. The solution one chooses rather depends on the particular case and one's own preferences. I, by myself, will try out all the approaches You mentioned to find out which suits me best in which situation. Feel free to post arguments for or against certain appraoches in the comments of the corresponding answers.
Finally, I would like to especially thank:
Unwind - for his sophisticated answer including a full implementation of the "struct-method"
Christoph - for his good answer and pointing me to Namespaces in C
All others - for Your great input
If someone finds it appropriate to close this question (as no further insights to expect), he/she should feel free to do so - I can not decide this, as I'm no C guru.
I'm no C guru, but from the libraries I have used, it is quite common to use a prefix to separate functions.
For example, SDL will use SDL, OpenGL will use gl, etc...
The struct way that Ken mentions would look something like this:
struct MyCoolApi
{
int (*add)(int x, int y);
};
MyCoolApi * my_cool_api_initialize(void);
Then clients would do:
#include <stdio.h>
#include <stdlib.h>
#include "mycoolapi.h"
int main(void)
{
struct MyCoolApi *api;
if((api = my_cool_api_initialize()) != NULL)
{
int sum = api->add(3, 39);
printf("The cool API considers 3 + 39 to be %d\n", sum);
}
return EXIT_SUCCESS;
}
This still has "namespace-issues"; the struct name (called the "struct tag") needs to be unique, and you can't declare nested structs that are useful by themselves. It works well for collecting functions though, and is a technique you see quite often in C.
UPDATE: Here's how the implementation side could look, this was requested in a comment:
#include "mycoolapi.h"
/* Note: This does **not** pollute the global namespace,
* since the function is static.
*/
static int add(int x, int y)
{
return x + y;
}
struct MyCoolApi * my_cool_api_initialize(void)
{
/* Since we don't need to do anything at initialize,
* just keep a const struct ready and return it.
*/
static const struct MyCoolApi the_api = {
add
};
return &the_api;
}
It's a shame you got scared off by C++, as it has namespaces to deal with precisely this problem. In C, you are pretty much limited to using prefixes - you certainly can't "put api operations in a struct".
Edit: In response to your second question regarding allowing users to specify their own prefix, I would avoid it like the plague. 99.9% of users will be happy with whatever prefix you provide (assuming it isn't too silly) and will be very UNHAPPY at the hoops (macros, structs, whatever) they will have to jump through to satisfy the remaining 0.1%.
As a library user, you can easily define your own shortened namespaces via the preprocessor; the result will look a bit strange, but it works:
#define ns(NAME) my_cool_namespace_ ## NAME
makes it possible to write
ns(foo)(42)
instead of
my_cool_namespace_foo(42)
As a library author, you can provide shortened names as desribed here.
If you follow unwinds's advice and create an API structure, you should make the function pointers compile-time constants to make inlinig possible, ie in your .h file, use the follwoing code:
// canonical name
extern int my_cool_api_add(int x, int y);
// API structure
struct my_cool_api
{
int (*add)(int x, int y);
};
typedef const struct my_cool_api *MyCoolApi;
// define in header to make inlining possible
static MyCoolApi my_cool_api_initialize(void)
{
static const struct my_cool_api the_api = { my_cool_api_add };
return &the_api;
}
Unfortunately, there's no sure way to avoid name clashes in C. Since it lacks namespaces, you're left with prefixing the names of global functions and variables. Most libraries pick some short and "unique" prefix (unique is in quotes for obvious reasons), and hope that no clashes occur.
One thing to note is that most of the code of a library can be statically declared - meaning that it won't clash with similarly named functions in other files. But exported functions indeed have to be carefully prefixed.
Since you are exposing functions with the same name client cannot include your library header files along with other header files which have name collision. In this case you add the following in the header file before the function prototype and this wouldn't effect client usage as well.
#define add myuniquelibname_add
Please note this is a quick fix solution and should be the last option.
For a really huge example of the struct method, take a look at the Linux kernel; 30-odd million lines of C in that style.
Prefixes are only choice on C level.
On some platforms (that support separate namespaces for linkers, like Windows, OS X and some commercial unices, but not Linux and FreeBSD) you can workaround conflicts by stuffing code in a library, and only export the symbols from the library you really need. (and e.g. aliasing in the importlib in case there are conflicts in exported symbols)

Resources