When I want to access some value in C struct in assembly, I usually just add something to the struct's address. However, when this struct is large, it's quite a bit tiring... Is there any trick or normal option to declare global label inside?
How to access C struct value in assembly?
To get the addresses of individual members, like you rightly pointed out, you have to add a constant offset to the address of the struct. The problem essentially boils down to automatically calculating these offsets for a given struct and member.
Even for small structs, I would suggest not calculating the offsets by hand because compilers add different padding between members and hints like __attribute__(packed) are not available for all compilers.
For most of my projects I use a separate program to generate an offsets header at build time.
The basic idea is to create a .c/.cpp file with something like -
// Your program headers that define the structs here
#include <stdio.h>
#include <stddef.h>
#define OFFSET(x, str, mem) printf("#define " #x " %d\n",(int) offsetof(str, mem))
#define SIZE(x, str) printf("#define " #x " %d\n", (int) sizeof(str))
#define HEADER_START() printf("#ifndef ASM_OFFSET_H\n#define ASM_OFFSET_H\n")
#define HEADER_END() printf("#endif\n")
int main(int argc, char *argv[]) {
HEADER_START();
OFFSET(struct_name_member_name, struct_name, member_name); // Substitute your struct name and member name here
SIZE(struct_name_size, struct_name); // This macro is to get the size of a struct
HEADER_END();
return 0;
}
Executing this program during the build time generates a header file that can be included directly with GASM. For other assemblers you can change the syntax to generate appropriate inc files.
The benefit of this approach is that the program uses the same headers as your C/C++ code. That way when you change any headers, the generated offsets should change automatically.
One thing you have to be careful here is that this technique does not work well if you are cross compiling for a different architecture and the compiler uses a different layout on your host vs the target. If you want something similar for cross compilation, I can write a separate answer.
If the structure is not too complicated, try a web search for "masm h2inc". H2INC.EXE (also needs H2INC.ERR) is an old Microsoft utility that will convert a .h file for a C program into a .inc file for MASM. Although this is part of a 16 bit toolset, the programs are 32 bit programs that use a dos extender and will run on Win XP, Win 7 64 bit, but I haven't tested Win 10 64 bit.
It doesn't seem to support 64 bit integers or 64 bit pointers (since it was mean for 16 bit Microsoft C code). You could work with a modified .h file and then post fix the .inc file.
There are other versions of h2inc from third parties that can support more current features, some of which will show up in a search for "masm h2inc".
I have a lot of preprocessor macro definitions, like this:
#define FOO 1
#define BAR 2
#define BAZ 3
In the real application, each definition corresponds to an instruction in an interpreter virtual machine. The macros are also not sequential in numbering to leave space for future instructions; there may be a #define FOO 41, then the next one is #define BAR 64.
I'm now working on a debugger for this virtual machine, and need to effectively 'reverse' these preprecessor macros. In other words, I need a function which takes the number and returns the macro name, e.g. an input of 2 returns "BAR".
Of course, I could create a function using a switch myself:
const char* instruction_by_id(int id) {
switch (id) {
case FOO:
return "FOO";
case BAR:
return "BAR";
case BAZ:
return "BAZ";
default:
return "???";
}
}
However, this will a nightmare to maintain, since renaming, removing or adding instructions will require this function to be modified too.
Is there another macro which I can use to create a function like this for me, or is there some other approach? If not, is it possible to create a macro to perform this task?
I'm using gcc 6.3 on Windows 10.
You have the wrong approach. Read SICP if you have not read it.
I have a lot of preprocessor macro definitions, like this:
#define FOO 1
#define BAR 2
#define BAZ 3
Remember that C or C++ code can be generated, and it is quite easy to instruct your build automation tool to generate some particular C file (with GNU make or ninja you just add some rule or recipe).
For example, you could use some different preprocessor (liek GPP or m4), or some script -e.g. in awk or Python or Guile, etc..., or write your own program (in C, C++, Ocaml, etc...), to generate the header file containing these #define-s. And another script or program (or the same one, invoked differently) could generate the C code of instruction_by_id
Such basic metaprogramming techniques (of generating some or several C files from something higher level but specific) have been used since at least the 1980s (e.g. with yacc or RPCGEN). The C preprocessor facilitates that with its #include directive (since you can even include lines inside some function body, etc...). Actually, the idea that code is data (and proof) and data is code is even older (Church-Turing thesis, Curry-Howard correspondence, Halting problem). The Gödel, Escher, Bach book is very entertaining....
For example, you could decide to have a textual file opcodes.txt (or even some sqlite database containing stuff....) like
# ignore lines starting with an hashsign
FOO 1
BAR 2
and have two small awk or Python scripts (or two tiny C specialized programs), one generating the #define-s (into opcode-defines.h) and another generating the body of instruction_by_id (into opcode-instr.inc). Then you need to adapt your Makefile to generate these, and put #include "opcode-defines.h" inside some global header, and have
const char* instruction_by_id(int id) {
switch (id) {
#include "opcode-instr.inc"
default: return "???";
}
}
this will a nightmare to maintain,
Not so with such a metaprogramming approach. You'll just maintain opcodes.txt and the scripts using it, but you express a given "knowledge element" (the relation of FOO to 1) only once (in a single line of opcode.txt). Of course you need to document that (at the very least, with comments in your Makefile).
Metaprogramming from some higher-level, declarative formalization, is a very powerful paradigm. In France, J.Pitrat pioneered it (and he is writing an interesting blog today, while being retired) since the 1960s. In the US, J.MacCarthy and the Lisp community also.
For an entertaining talk, see Liam Proven FOSDEM 2018 talk on The circuit less traveled
Large software are using that metaprogramming approach quite often. For example, the GCC compiler have about a dozen of C++ code generators (in total, they are emitting more than a million of C++ lines).
Another way of looking at such an approach is the idea of domain-specific languages that could be compiled to C. If you use an operating system providing dynamic loading, you can even write a program emitting C code, forking a process to compile it into some plugin, then loading that plugin (on POSIX or Linux, with dlopen). Interestingly, computers are now fast enough to enable such an approach in an interactive application (in some sort of REPL): you can emit a C file of a few thousand lines, compile it into some .so shared object file, and dlopen that, in a fraction of second. You could also use JIT-compiling libraries like GCCJIT or LLVM to generate code at runtime. You could embed an interpreter (like Lua or Guile) into your program.
BTW, metaprogramming approaches is one of the reasons why basic compilation techniques should be known by most developers (and not only just people in the compiler business); another reason is that parsing problems are very common. So read the Dragon Book.
Be aware of Greenspun's tenth rule. It is much more than a joke, actually a profound truth about large software.
In a similar case I've resorted to defining a text file format that defines the instructions, and writing a program to read this file and write out the C source of the actual instruction definitions and the C source of functions like your instruction_by_id(). This way you only need to maintain the text file.
As awesome as general code generation is, I’m surprised that nobody mentioned that (if you relax your problem definition just a bit) the C preprocessor is perfectly capable of generating the necessary code, using a technique called X macros. In fact every simple bytecode VM in C that I’ve seen uses this approach.
The technique works as follows. First, there is a file (call it insns.h) containing the authoritative list of instructions,
INSN(FOO, 1)
INSN(BAR, 2)
INSN(BAZ, 3)
or alternatively a macro in some other header containing the same,
#define INSNS \
INSN(FOO, 1) \
INSN(BAR, 2) \
INSN(BAZ, 3)
whichever is more conveinent for you. (I’ll use the first option in the following.) Note that INSN is not defined anywhere. (Traditionally it would be called X, thus the name of the technique.) Wherever you want to loop over your instructions, define INSN to generate the code you want, include insns.h, then undefine INSN again.
In your disassembler, write
const char *instruction_by_id(int id) {
switch (id) {
#define INSN(NAME, VALUE) \
case NAME: return #NAME;
#include "insns.h" /* or just INSNS if you use a macro */
#undef INSN
default: return "???";
}
}
using the prefix stringification operator # to turn names-as-identifiers into names-as-string-literals.
You obviously can’t define the constants this way, because macros cannot define other macros in the C preprocessor. However, if you don’t insist that the instruction constants be preprocessor constants, there’s a different perfectly serviceable constant facility in the C language: enumerations. Whether or not you use an enumerated type, the enumerators defined inside it are regular integer constants from the point of view of the compiler (though not the preprocessor—you cannot use #ifdef with them, for example). So, using an anonymous enumeration type, define your constants like this:
enum {
#define INSN(NAME, VALUE) \
NAME = VALUE,
#include "insns.h" /* or just INSNS if you use a macro */
#undef INSN
NINSNS /* C89 doesn’t allow trailing commas in enumerations (but C99+ does), and you may find this constant useful in any case */
};
If you want to statically initialize an array indexed by your bytecodes, you’ll have to use C99 designated initializers {[FOO] = foovalue, [BAR] = barvalue, /* ... */} whether or not you use X macros. However, if you don’t insist on assigning custom codes to your instructions, you can eliminate VALUE from the above and have the enumeration assign consecutive codes automatically, and then the array can be simply initialized in order, {foovalue, barvalue, /* ... */}. As a bonus, NINSNS above then becomes equal to the number of the instructions and the size of any such array, which is why I called it that.
There are more tricks you can use here. For example, if some instructions have variants for several data types, the instruction list X macro can call the type list X macro to generate the variants automatically. (The somewhat ugly second option of storing the X macro list in a large macro and not an include file may be more handy here.) The INSN macro may take additional arguments such as the mode name, which would ignored in the code list but used to call the appropriate decoding routine in the disassembler. You can use token pasting operator ## to add prefixes to the names of the constants, as in INSN_ ## NAME to generate INSN_FOO, INSN_BAR, etc. And so on.
Say I have two device drivers and I want them to share the same interface so a caller doesn't know which driver it is talking to exactly. How would I organize this in C? I have thought of a couple of ways:
First: Create a pair of .c/.h files for both drivers with the same interface and create a switch in the caller:
//main.c:
#ifdef USING_DRIVER_1
#include "driver_1.h"
#else
#include "driver_2.h"
#endif // USING_DRIVER_1
Second: Use a single header and create a file-long switch in the drivers' source file like so:
//driver_1.c:
#ifdef USING_DRIVER_1
#include "driver.h"
bool func(uint32_t var)
{
foo(var);
}
#endif // USING_DRIVER_1
//driver_2.c:
#ifndef USING_DRIVER_1
#include "driver.h"
bool func(uint32_t var)
{
bar(var);
}
#endif // !USING_DRIVER_1
Third: This one is a lot like the second one but instead of using switch statements in files themselves, a specific driver is chosen in the makefile or IDE equivalent:
#makefile:
SRC = main.c
#SRC += driver_1.c
SRC += driver_2.c
I'm sure one of these is superior to others and there are probably some I haven't thought of. How is it done in practice?
EDIT:
Details about my particular system: my target is an ARM microcontroller and my dev. environment is an IDE. Device drivers are for two different revisions and will never be used at the same time so each build should contain only one version. Devices themselves are modems operating via AT commands.
All three variants are actually useful. Which to choose depends on what you actually need:
Selecting the driver from the caller would add both drivers to the code. That only makes sense if you switch drivers at run-time. Then it would be the (only) way to go. Use e.g. function pointers or two identical const structs which provide the interface (function pointer and possibly other data).
A global switch is plain ugly and not possible across functions and declarations. Better would be conditional compilation using #if .. #elif #end. That makes sense if the two drivers have only minor differences, e.g. different SPI interfaces (SPI1 vs. SPI2 ...). Then this is the way to go. With some effort in the build-tool you can even use this for case 1. (one file for two different drivers, but not my recommendation).
If both drivers are substantial different in their implementation, but have the same interface, take the third approach, but use a single header or both drivers (see below).
Note for all but the first approach, both drivers have to provide an identical interface to the application. The first approach actually allows for differences, but that would actually require the user code treat them different and that's likely not what you want.
Using a single header file for both drivers (e.g.: "spi_memory.h" and "spi_flash.c" vs. "spi_eeprom.c") does ensure the application does not see an actual difference - as long as the drivers also behave identically, of course. Minor differences can be caught by variables in the interface (e.g. extern size_t memory_size;) or functions (the better approach).
I recommend using pointers to functions. For example:
struct driver_api {
bool (*pFunc)(uint32_t);
} DriverApi;
void initializeApi(struct driver_api *pApi);
// driver1.c:
void initializeApi(struct driver_api *pApi)
{
pApi->pFunc = bar;
}
// driver2.c:
void initializeApi(struct driver_api *pApi)
{
pApi->pFunc = foo;
}
Another thing you might consider is removing the #ifndef USING_DRIVER_1 checks from your source files. Use a build system (e.g. make) and specify which source files should be included in the project. Then, based on some compile time option (such as a command line argument) include driver1.c or driver2.c, but never both.
The "advantage" of the pointers is that you can compile both APIs and then decide at runtime (even changing it mid run, for whatever reason).
Ok, I'm looking to define a set of memory addresses as constants in a .h file that's used by a bunch of .c files (we're in C, not C++). I want to be able to see the name of the variable instead of just seeing the hex address in the debugger... so I want to convert the #defines I currently have into constants that are global in scope. The problem is, if I define them like this:
const short int SOME_ADDRESS = 0x0010
then I get the dreaded "multiple declarations" error since I have multiple .c files using this same .h. I would like to use an enum, but that won't work since it defaults to type integer (which is 16 bits on my system... and I need to have finer control over the type).
I thought about putting all the addresses in a struct... but I have no way (that I know of) of setting the default values of the instance of the structure in the header file (I don't want to assume that a particular .c file uses the structure first and fills it elsewhere.. I'd really like to have the constants defined in the .h file)
It seemed so simple when I started, but I don't see a good way of defining a globally available short int constant in a header file... anyone know a way to do this?
thanks!
Declare the constants in the header file using extern:
extern const short int SOME_ADDRESS;
then in any, but only one, .c file provide the definition:
const short int SOME_ADDRESS = 0x0010;
If you're compiling with gcc, you can add the -ggdb3 switch, which will tell gcc to store macro information (i.e. #defines) so that they can be used inside gdb.
Suppose I would like to declare a set of constants in C (representing error codes of an application). how would you divide them into files ? would you use enum in external file and include it ?
Thanks,
James
Yes, #defines or enums in a .h file is the way to go. Enums are useful if you're debugging with a debugger like gdb as you'll see a more descriptive value than a number.
If it's a set of related numeric values, then an enum is the correct approach. If the values are C strings or otherwise not representable by an enum (or if they don't sensibly form a set of related values), you can use one of two approaches.
Either use preprocessor #define statements, or use extern const-marked variables. The former is resolved at compile-time, so you can use it to specify array lengths or to actively call code when used. The latter, however, allows you to change the value of the constant (by specifying it in a .c file rather than a .h file) without having to recompile every file that uses it.
Because extern const-marked variables can be changed in that fashion, they are preferable in code that is reused across many projects, or code that is distributed as a library. Changes to the library are then possible without forcing programs to be recompiled.
If it's a set of values an enumeration declared in a header file would suffice (some people use #defines but since the value doesn't matter an enumeration works just fine in this case). If you simply want to compare to error codes this is a good method.