C: const initializer and debugging symbols - c

In code reviews I ask for option (1) below to be used as it results in a symbol being created (for debugging) whereas (2) and (3) do not appear to do so at least for gcc and icc. However (1) is not a true const and cannot be used on all compilers as an array size. Is there a better option that includes debug symbols and is truly const for C?
Symbols:
gcc f.c -ggdb3 -g ; nm -a a.out | grep _sym
0000000100000f3c s _symA
0000000100000f3c - 04 0000 STSYM _symA
Code:
static const int symA = 1; // 1
#define symB 2 // 2
enum { symC = 3 }; // 3
GDB output:
(gdb) p symA
$1 = 1
(gdb) p symB
No symbol "symB" in current context.
(gdb) p symC
No symbol "symC" in current context.
And for completeness, the source:
#include <stdio.h>
static const int symA = 1;
#define symB 2
enum { symC = 3 };
int main (int argc, char *argv[])
{
printf("symA %d symB %d symC %d\n", symA, symB, symC);
return (0);
}

The -ggdb3 option should be giving you macro debugging information. But this is a different kind of debugging information (it has to be different - it tells the debugger how to expand the macro, possibly including arguments and the # and ## operators) so you can't see it with nm.
If your goal is to have something that shows up in nm, then I guess you can't use a macro. But that's a silly goal; you should want to have something that actually works in a debugger, right? Try print symC in gdb and see if it works.
Since macros can be redefined, gdb requires the program to be stopped at a location where the macro existed so it can find the correct definition. In this program:
#include <stdio.h>
int main(void)
{
#define X 1
printf("%d\n", X);
#undef X
printf("---\n");
#define X 2
printf("%d\n", X);
}
If you break on the first printf and print X you'll get the 1; next to the second printf and gdb will tell you that there is no X; next again and it will show the 2.
Also the gdb command info macro foo can be useful, if foo is a macro that takes arguments and you want to see its definition rather than expand it with a specific set of arguments. And if a macro expands to something that's not an expression, gdb can't print it so info macro is the only thing you can do with it.
For better inspection of the raw debugging information, try objdump -W instead of nm.

However (1) is not a true const and cannot be used on all compilers as an array size.
This can be used as array size on all compilers that support C99 and latter (gcc, clang). For others (like MSVC) you have only the last two options.
Using option 3 is preferred 2. enums are different from #define constants. You can use them for debugging. You can use enum constants as l-value as well unlike #define constants.

Related

Why does the `NUMBER` in K&R "Reverse Polish Calculator" is showing as void in gdb?

As per K&R, Reverse Polish Calculator, decreased the main function, in order to get better understanding:
#include <stdio.h>
#include <stdlib.h>
#define NUMBER '0'
#define MAXOP 5
void push(double);
int pop(void);
int getop(char []);
int main(){
int type;
char s[MAXOP];
double op2;
while ((type=getop(s))!=EOF){
switch(type):
case NUMBER:
push(atof(s));
printf("\t%s\n",s);
}
}
#define MAXVAL 100
char val[MAXVAL];
int sp;
void push(double f){
if (sp<MAXVAL)
val[sp++]=f;
}
int pop(void){
if (sp>0)
return val[--sp];
}
#include <ctype.h>
int getch(void);
void ungetch(int);
int getop(char s[]){
int i,c;
while (s[0]=c=getch())==' '||c=='\t')
;
s[1]='\0';
if (!isdigit(c)&&c!='.')
return c;
i=0;
if (isdigit(c))
while (isdigit(s[++i]=c=getch()))
;
if (c=='.')
while (isdigit(s[++i]=c=getch()))
;
s[i]='\0';
if (c!=EOF)
ungetch(c);
return NUMBER;
}
#define BUFSIZE 100
char buf[BUFSIZE];
int bufp=0;
int getch(void){
return (bufp>0)?buf[--bufp]:getchar();
}
int ungetch(int c){
if (bufp>=BUFSIZE)
printf("ungetch: too many characters\n");
else
buf[bufp++]=c;
}
I can see, that the MAXOP 5 is /* max size of operand or operator */, which is being defined as external variable, using #define. What I can't figure out, is how can I actually track the value of of MAXOP, at each stage of the program run, using gdb?
After I have provided the number 10 to the getchar(), while debugging:
14 while ((type=getop(s))!=EOF){
(gdb) n
Breakpoint 14, getop (s=0x7efff5dc "\n") at t.c:47
47 while ((s[0]=c=getch())==' '||c=='\t')
(gdb) p c
$22 = 10
(gdb) n
Breakpoint 31, getch () at t.c:72
72 return (bufp>0)?buf[--bufp]:getchar();
(gdb) n
10
Breakpoint 34, getch () at t.c:73
73 }
(gdb) n
At some point, when reaching the end of getop function:
Breakpoint 30, getop (s=0x7efff5dc "10") at t.c:62
62 return NUMBER;
(gdb) p number
No symbol "number" in current context.
(gdb) p (NUMBER)
No symbol "NUMBER" in current context.
(gdb) p $NUMBER
$39 = void
(gdb) n
63 }
(gdb) n
Breakpoint 2, main () at t.c:15
15 switch(type){
(gdb) p type
$40 = 48
(gdb) p NUMBER
No symbol "NUMBER" in current context.
(gdb) p /s NUMBER
No symbol "NUMBER" in current context.
(gdb) p /d $NUMBER
$41 = Value can't be converted to integer.
(gdb) p $NUMBER
$42 = void
Questions:
Can the value of NUMBER be accessed from the shell of linux, after the above program has been compiled, and run? In other words, does the preprocessing directive #define NUMBER '0' creates the external variable NUMBER that is the same as, for instance, variable $PATH on Linux?
Why does the p $NUMBER command is showing void value for the external variable NUMBER?
Why does the p NUMBER command show No symbol "NUMBER" in current context.? Does it mean, that the external variable is blocked for gdb?
Can the value of NUMBER be accessed from the shell of linux, after the above program has been compiled, and run? In other words, does the preprocessing directive #define NUMBER '0' creates the external variable NUMBER that is the same as, for instance, variable $PATH on Linux?
No, fortunately the preprocessor symbols and the C symbols are not mapped in shell variables when you execute a program.
Why does the p $NUMBER command is showing void value for the external variable NUMBER?
Why does the p NUMBER command show No symbol "NUMBER" in current context.? Does it mean, that the external variable is blocked for gdb?
NUMBER is a preprocessor symbol, it disappear during the preprocessing phase because it is replaced by its value, the compiler by itself doesn't see that symbol in the source it compiles, so it cannot put information about it in the debug datas (e.g. tags), so it is unknown for the debugger
So p $NUMBER is equivalent of p $KQHJDSFKJQHKJSDHKJHQSJHDKJHQKJHDSJHSQD and value void
And p NUMBER is equivalent of p KQHJDSFKJQHKJSDHKJHQSJHDKJHQKJHDSJHSQD and says the symbol doesn't exist
If I just do the preprocessing phase after I put your #include in comment (to not get thousands of lines from them) :
/tmp % gcc -E c.c
# 1 "c.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "c.c"
void push(double);
int pop(void);
int getop(char []);
int main(){
int type;
char s[5];
double op2;
while ((type=getop(s))!=EOF){
switch(type):
case '0':
push(atof(s));
printf("\t%s\n",s);
}
}
char val[100];
int sp;
void push(double f){
if (sp<100)
val[sp++]=f;
}
int pop(void){
if (sp>0)
return val[--sp];
}
int getch(void);
void ungetch(int);
int getop(char s[]){
int i,c;
while (s[0]=c=getch())==' '||c=='\t')
;
s[1]='\0';
if (!isdigit(c)&&c!='.')
return c;
i=0;
if (isdigit(c))
while (isdigit(s[++i]=c=getch()))
;
if (c=='.')
while (isdigit(s[++i]=c=getch()))
;
s[i]='\0';
if (c!=EOF)
ungetch(c);
return '0';
}
char buf[100];
int bufp=0;
int getch(void){
return (bufp>0)?buf[--bufp]:getchar();
}
int ungetch(int c){
if (bufp>=100)
printf("ungetch: too many characters\n");
else
buf[bufp++]=c;
}
/tmp %
As you see NUMBER, MAXOP, MAXVAL and BUFSIZE are replaced by their value
C’s #define statement does not create an external variable. It creates what is called a macro.
Macros are replaced during program translation, before or early in compilation. For example, with #define NUMBER '0', the result is as if every instance of NUMBER in the source code were replaced with '0'.
Regarding your specific questions:
These macro definitions are typically not tracked in the debugging information that the compiler produces (although such tracking may be offered as a feature), and they are not made visible to command shells or the debugger.
In GDB, $foo refers to a GDB variable named foo, not a program variable named foo. GDB provides separate variables as a convenience to use during debugging. They are for interacting with GDB and do not come fro the program. So the command p $NUMBER asks GDB to print the value of its variable named NUMBER. There is no such variable, so GDB reports it as void.
p NUMBER shows “No symbol "NUMBER" in current context” because there is no symbol NUMBER that is known to GDB.
I can see that you have some very dire misunderstanding of the C language syntax. Not to berate you, but have you tried learning C from some other source? K&R is a great book, but it is notoriously concise and assumes that you already know programming. Try going through the lists here: The Definitive C Book Guide and List
======
NUMBER, MAXOP and MAXVAL are constants. They are defined through a pre-processor directive, and are NOT variables. And definitely not external variables which is a vastly different concept.
When you write #define NUMBER '0', it instructs the compiler to replace every instance of NUMBER in the source with '0'. It is simple search and replace on your original source code. It does not create a variable and you cannot assign a value to it. So, asking to follow the value of a #define'ed value makes no sense. It is always going to be the same value that was written in the source.
Also, please be clear that there is no direct relation between variables you define in your program and the environment variables on your system.
About the next two questions, the short answer is, "Because GDB doesn't know they exist".
Longer Answer: As mentioned earlier, your pre-processor directives are simply instructions to your compiler for a search and replace. Once done with them, there is no need to keep them around for any longer and hence the compiler will discard them.
GDB only knows as much about your program as is available in the final binary that the compiler generates. If the compiler doesn't mention anything about NUMBER in the binary, GDB cannot even know that it ever existed.
Now, that does not mean that it is impossible to see this data in GDB. When compiling, you can pass the -ggdb3 option to GCC to enable GCC to generate debugging code specific to GDB. This includes detailed information about the program including all the macros and pre-processor directives. With this extra flag, you can see the value of your #define'ed constants, however, remember, they will never change. This is generally only useful for seeing the execution of macro functions which is a much more advanced topic.

Why can't I read a C constant from Golang properly?

I am using go-hdf5 to read an hdf5 file into golang. I am on windows7 using a pretty recent copy of mingw and hdf5 1.8.14_x86 and it seems like trying to use any of the predefined types doesn't work, let's focus for example on T_NATIVE_UINT64. I have reduced the issue to the following, which basically leaves go-hdf5 out of the problem and points at something quite fundamental going wrong:
package main
/*
#cgo CFLAGS: -IC:/HDF_Group/HDF5/1.8.14_x86/include
#cgo LDFLAGS: -LC:/HDF_Group/HDF5/1.8.14_x86/bin -lhdf5 -lhdf5_hl
#include "hdf5.h"
#include <stdio.h>
void print_the_value2() { printf("the value of the constant is %d\n", H5T_NATIVE_UINT64); }
*/
import "C"
func main() {
C.print_the_value2()
}
You obviously need to have hdf5 and point the compiler at the headers/dlls and running go get, then executing prints this on my pc
the value of the constant is -1962924545
Running variations of the above, in how/where the constant is read, will give different answers for the value of H5T_NATIVE_UINT64. However I am pretty sure that is none are the right value and in fact trying to use a type with the id returned doesn't work, unsurprisingly.
If I write and run a "real" C program, I get different results
#include <stdio.h>
#include "hdf5.h"
hid_t _go_hdf5_H5T_NATIVE_UINT64() { return H5T_NATIVE_UINT64; }
int main()
{
printf("the value of the constant is %d", _go_hdf5_H5T_NATIVE_UINT64());
}
Compiling using
C:\Temp>gcc -IC:/HDF_Group/HDF5/1.8.14_x86/include -LC:/HDF_Group/HDF5/1.8.14_x86/bin -lhdf5 -lhdf5_hl -o stuff.exe stuff.c
and running gives me
the value of the constant is 50331683
And that appears to be the right value as I can use it directly from my go program. Obviously I want to be able to use the constants instead. Any idea why this could be happening?
Extra info following comments below:
I looked for the definition of H5T_NATIVE_UINT64 in the hdf5 headers and see the following
c:\HDF_Group\HDF5\1.8.14_x86\include>grep H5T_NATIVE_UINT64 *
H5Tpkg.h:H5_DLLVAR size_t H5T_NATIVE_UINT64_ALIGN_g;
H5Tpublic.h:#define H5T_NATIVE_UINT64 (H5OPEN H5T_NATIVE_UINT64_g)
H5Tpublic.h:H5_DLLVAR hid_t H5T_NATIVE_UINT64_g;
The whole header is here
http://www.hdfgroup.org/ftp/HDF5/prev-releases/hdf5-1.8.14/src/unpacked/src/H5Tpublic.h
Thanks!
H5T_NATIVE_UINT64 is NOT a constant but a #define that ultimately evaluates to (H5Open(), H5T_NATIVE_UINT64_g), which cgo does not understand.
It's easy to check by turning on debug output on gcc's preprocessor:
gcc -E -dM your_test_c_file.c | grep H5T_NATIVE_UINT64
Result:
#define H5T_NATIVE_UINT64 (H5OPEN H5T_NATIVE_UINT64_g)
Now the same for H5OPEN:
gcc -E -dM test_go.c | grep '#define H5OPEN'
gives:
#define H5OPEN H5open(),
Right now, cgo does understand simple integer constant defines like #define VALUE 1234, or anything that the gcc preprocessor will turn into an integer constant. See the function func (p *Package) guessKinds(f *File) in $GOROOT/src/cmd/cgo/gcc.go.

Print all defined macros

I'm attempting to refactor a piece of legacy code and I'd like a snapshot of all of the macros defined at a certain point in the source. The code imports a ridiculous number of headers etc. and it's a bit tedious to track them down by hand.
Something like
#define FOO 1
int myFunc(...) {
PRINT_ALL_DEFINED_THINGS(stderr)
/* ... */
}
Expected somewhere in the output
MACRO: "FOO" value 1
I'm using gcc but have access to other compilers if they are easier to accomplish this task.
EDIT:
The linked question does not give me the correct output for this:
#include <stdio.h>
#define FOO 1
int main(void) {
printf("%d\n", FOO);
}
#define FOO 0
This very clearly prints 1 when run, but gcc test.c -E -dM | grep FOO gives me 0
To dump all defines you can run:
gcc -dM -E file.c
Check GCC dump preprocessor defines
All defines that it will dump will be the value defined (or last redefined), you won't be able to dump the define value in all those portions of code.
You can also append the option "-Wunused-macro" to warn when macros have been redefined.

Is math within macro computed at compile time?

For example, does MIN_N_THINGIES below compile to 2? Or will I recompute the division every time I use the macro in code (e.g. recomputing the end condition of a for loop each iteration).
#define MAX_N_THINGIES (10)
#define MIN_N_THINGIES ((MAX_N_THINGIES) / 5)
uint8_t i;
for (i = 0; i < MIN_N_THINGIES; i++) {
printf("hi");
}
This question stems from the fact that I'm still learning about the build process. Thanks!
If you pass -E to gcc it will show what the preprocessor stage outputted.
gcc -E test.c | tail -n11
Outputs:
# 3 "test.c" 2
int main() {
uint8_t i;
for (i = 0; i < ((10) / 5); i++) {
printf("hi");
}
return 0;
}
Then if you pass -s flag to gcc you will see that the division was optimized out. If you also pass the -o flag you can set the output files and diff them to see that they generated the same code.
gcc -S test.c -o test-with-div.s
edit test.c to make MIN_N_THINGIES equal a const 2
gcc -S test.c -o test-constant.s
diff test-with-div.s test-constant.s
// for educational purposes you should look at the .s files generated.
Then as mentioned in another comment you can change the optimization flag by using -O...
gcc -S test.c -O2 -o test-unroll-loop.s
Will unroll the for loop even such that there isn't even a loop.
Preprocessor will replace MIN_N_THINGIES with ((10)/5), then it is up to the compiler to optimize ( or not ) the expression.
Maybe. The standard does not mandate that it is or it is not. On most compilers it will do after passing optimization flags (for example gcc with -O0 does not do it while with -O2 it even unrolls the loop).
Modern compilers perform even much more complicated techniques (vectorization, loop skewing, blocking ...). However unless you really care about performance, for ex. you program HPC, program real time system etc., you probably should not care about the output of the compiler - unless you're just interested (and yes - compilers can be a fascinating subject).
No. The preprocessor does not calculate macros, they're handled by the compiler. The preprocessor can calculate arithmetic expressions (no floating point values) in #if conditionals though.
Macros are simply text substitutions.
Note that the expanded macros can still be calculated and optimized by the compiler, it's just that it's not done by the preprocessor.
The standard mandates that some expressions are evaluated at compile time. But note that the preprocessor does just text splicing (well, almost) when the macro is called, so if you do:
#define A(x) ((x) / (S))
#define S 5
A(10) /* Gives ((10) / (5)) == 2 */
#undef S
#define S 2
A(20) /* Gives ((20) / (2)) == 10 */
The parenteses are to avoid idiocies like:
#define square(x) x * x
square(a + b) /* Gets you a + b * a + b, not the expected square */
After preprocessing, the result is passed to the compiler proper, which does (most of) the computation in the source that the standard requests. Most compilers will do a lot of constant folding, i.e., computing (sub)expressions made of known constants, as this is simple to do.
To see the expansions, it is useful to write a *.c file of a few lines, just with the macros to check, and run it just through the preprocessor (typically someting like cc -E file.c) and check the output.

Dynamic obfuscation by self-modifying code

Here what's i am trying to do:
assume you have two fonction
void f1(int *v)
{
*v = 55;
}
void f2(int *v)
{
*v = 44;
}
char *template;
template = allocExecutablePages(...);
char *allocExecutablePages (int pages)
{
template = (char *) valloc (getpagesize () * pages);
if (mprotect (template, getpagesize (),
PROT_READ|PROT_EXEC|PROT_WRITE) == -1) {
perror (“mprotect”);
}
}
I would like to do a comparison between f1 and f2 (so tell what is identical and what is not) (so get the assembly lines of those function and make a line by line comparison)
And then put those line in my template.
Is there a way in C to do that?
THanks
Update
Thank's for all you answers guys but maybe i haven't explained my need correctly.
basically I'm trying to write a little obfuscation method.
The idea consists in letting two or more functions share the same location in memory. A region of memory (which we will call a template) is set up containing some of the
machine code bytes from the functions, more specifically, the ones they all
have in common. Before a particular function is executed, an edit script is used
to patch the template with the necessary machine code bytes to create a
complete version of that function. When another function assigned to the same
template is about to be executed, the process repeats, this time with a
different edit script. To illustrate this, suppose you want to obfuscate a
program that contains two functions f1 and f2. The first one (f1) has the
following machine code bytes
Address Machine code
0 10
1 5
2 6
3 20
and the second one (f2) has
Address Machine code
0 10
1 9
2 3
3 20
At obfuscation time, one will replace f1 and f2 by the template
Address Machine code
0 10
1 ?
2 ?
3 20
and by the two edit scripts e1 = {1 becomes 5, 2 becomes 6} and e2 = {1
becomes 9, 2 becomes 3}.
#include <stdlib.h>
#include <string.h>
typedef unsigned int uint32;
typedef char * addr_t;
typedef struct {
uint32 offset;
char value;
} EDIT;
EDIT script1[200], script2[200];
char *template;
int template_len, script_len = 0;
typedef void(*FUN)(int *);
int val, state = 0;
void f1_stub ()
{
if (state != 1) {
patch (script1, script_len, template);
state = 1;
}
((FUN)template)(&val);
}
void f2_stub () {
if (state != 2) {
patch (script2, script_len, template);
state = 2;
}
((FUN)template)(&val);
}
int new_main (int argc, char **argv)
{
f1_stub ();
f2_stub ();
return 0;
}
void f1 (int *v) { *v = 99; }
void f2 (int *v) { *v = 42; }
int main (int argc, char **argv)
{
int f1SIZE, f2SIZE;
/* makeCodeWritable (...); */
/* template = allocExecutablePages(...); */
/* Computed at obfuscation time */
diff ((addr_t)f1, f1SIZE,
(addr_t)f2, f2SIZE,
script1, script2,
&script_len,
template,
&template_len);
/* We hide the proper code */
memset (f1, 0, f1SIZE);
memset (f2, 0, f2SIZE);
return new_main (argc, argv);
}
So i need now to write the diff function. that will take the addresses of my two function and that will generate a template with the associated script.
So that is why i would like to compare bytes by bytes my two function
Sorry for my first post who was not very understandable!
Thank you
Do you want to do this at runtime or during authorship?
You can probably instruct your C compiler to produce assembly language output, for example gcc has the -S option which will produce output in file.s Your compiler suite may also have a program like objdump which can decompile an object file or entire executable. However, you generally want to leave optimizations up to a modern compiler rather than do it yourself.
At runtime the & operator can take the address of a function and you can read through it, though you have to be prepared for the possibility of encountering a branch instruction before anything interesting, so you actually have to programatically "understand" at least a subset of the instruction set. What you will run into when reading function pointers will of course vary all over the place by machine, ABI, compiler, optimization flags, etc.
Put the functions into t1.c and t2.c use gcc -S to generate assembly output:
gcc -S t1.c
gcc -S t2.c
Now compare t1.s and t2.s.
If you are using Visual Studio, go to
Project Properties -> Configuration -> C/C++ -> Output Files -> Assembler output
or use compiler switches /FA, /FAc, /FAs, /FAcs. Lower-case c means output machine code, s-source code side-by-side with assembly code. And don't forget to disable compiler optimizations.
Having read through some of the answers and the comments there, I'm not sure I fully understand your question, but maybe you're looking for a gcc invocation like the following:
gcc -S -xc - -o -
This tells gcc to input C code from stdin and output assembly to stdout.
If you use a vi-like editor, you can highlight the function body in visual mode and then run the command:
:'<,'>!gcc -S -xc - -o - 2> /dev/null
...and this will replace the function body with assembly (the "stderr > /dev/null" business is to skip errors about #include's).
You could otherwise use this invocation of gcc as part of a pipeline in a script.

Resources