Prevent my C code from printing (seriously slows down the execution) - c

I have an issue.
I finally found a way to use an external library to solve my numerical systems. This library automatically prints the matrices. It is fine for dim=5, but for dim=1.000.000, you understand the problem...
Those parasite "printf"s slow down considerably the execution, and I would like to get rid of them. The problem is: I don't know where they are ! I looked in every ".h" and ".c" file in my library: they are nowhere to be found.
I suspect they already are included in the library itself: superlu.so. I can't access them, thus.
How could I possibly prevent my C code from printing anything during the execution ?
Here is my Makefile. I use the libsuperlu-dev library, directly downloaded from Ubuntu. The .so file was already there.
LIB = libsuperlu.so
main: superlu.o read_file.o main.o sample_arrays.o super_csr.o
cc $^ -o $# $(LIB)
clean:
rm *.o
rm main

Just to explain the LD_PRELOAD method that was mentioned, that I use sometimes precisely for that usage (or, on the contrary to add some printf, for example, when I want to pipe the output of a GUI), here is how you can do a rudimentary version of it
myprint.c:
int printf(char *, ...){
return 0;
}
int putchar(int){
return 0;
}
Then
gcc -shared -std=gnu99 -o myprint.so myprint.c
Then
LD_PRELOAD=./myprint.so ./main
Forces the load of your printf and putchar symbols before any other library has the opportunity to load them force. So, no printing occurs. At least none with printf. But you may have to add some other functions to the list, such as fprintf, fputc, fputs, puts, ...
And of course, another problem of overloading the fthing functions (and even possibly the others), is that you might also prevent some wanted behavior. Such as writing files. Or interacting with some devices.
It may be even worse if those printing are done with low level write function. That one, you very likely can't afford to overload (unless you overload it with a function that calls the real write, loaded manually by dlopen) filtering only the ones that you want to avoid, based on target file descriptor (1) or on content of written data.
Note: if you want to verify if the libsuperlu.so is responsible of those printing, you can check with nm libsuperlu.so if it is referring to some well known printing functions, such as printf

Related

Where does GCC find printf ? My code worked without any #include

I am a C beginner so I tried to hack around the stuff.
I read stdio.h and I found this line:
extern int printf (const char *__restrict __format, ...);
So I wrote this code and i have no idea why it works.
code:
extern int printf (const char *__restrict __format, ...);
main()
{
printf("Hello, World!\n");
}
output:
sh-5.1$ ./a.out
Hello, World!
sh-5.1$
Where did GCC find the function printf? It also works with other compilers.
I am a beginner in C and I find this very strange.
gcc will link your program, by default, with the c library libc which implements printf:
$ ldd ./a.out
linux-vdso.so.1 (0x00007ffd5d7d3000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdf2d307000)
/lib64/ld-linux-x86-64.so.2 (0x00007fdf2d4f0000)
$ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep ' printf' | head -1
0000000000056cf0 T printf##GLIBC_2.2.5
If you build your program with -nolibc you have to satisfy a few symbols on your own (see
Compiling without libc):
$ gcc -nolibc ./1.c
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/10/../../../x86_64-linux-gnu/Scrt1.o: in function `_start':
(.text+0x12): undefined reference to `__libc_csu_fini'
/usr/bin/ld: (.text+0x19): undefined reference to `__libc_csu_init'
/usr/bin/ld: (.text+0x26): undefined reference to `__libc_start_main'
/usr/bin/ld: /tmp/user/1000/ccCFGFhf.o: in function `main':
1.c:(.text+0xc): undefined reference to `puts'
collect2: error: ld returned 1 exit status
You need to understand the difference between the compile and link phases of program compilation.
In the compilation phase you describe to the compiler the various things you intend to call that may be in this file, in other files or in libraries. This is done using function declarations.
int woodle(char*);
for example. This is what header files are full of.
If the function is in the same file then the compiler will work out how to call it while it compiles that file. But for other functions it leaves a note in the generated code that says
please wire up the woodle function here so I can call it.
Usually called an import and there are tools you can use to look at the imports in an object file - name depends on platform and toolset
The linkers job is to find those imports and resolve them. It will look at objects files passed on the command line, at libraries included on the command line and also standard libraries that the c standard says should be available to all programs.
In your printf case the linker found printf in the c standard library that the linker includes automatically.
BTW - the linker looks for 'exports' from objects and libraries, there are tools to look at those too. The linkers job is to match each 'import' to an 'export'
First, realize what the gcc program is. Technically, it is not a compiler, but a compiler driver. A compiler driver is responsible for driving the various other tools which perform compilation-related tasks. Some of the tools are found in PATH, whereas others are in internal compiler directories.
There are various ways to check what the driver is doing. I won't go into much detail about how I made the rest of this post, but briefly:
strace -f -e %process gcc is a Linux-specific way of showing all the programs executed (elsewhere in this answer, I assume Linux when specifying details but it doesn't matter)
gcc -v will dump out various information, but you have to learn what parts actually matter for whatever you are doing.
there exists a "specs" file that controls some of the argument-related stuff the driver does
Now for the actual data:
Here's the tree of processes that gcc might execute:
gcc, the "driver" (input various, output various. Some arguments are handled by the driver itself, but most are passed to the various subprocesses)
(these are repeated for every input file. If -pipe is passed, temporary files are omitted and processes are run in parallel; if --save-temps is passed, intermediate files are preserved):
cc1 -E -lang-asm, the "preprocessor" for assembly code (input .S, output .s - yes, case matters. Only relevant if you're trying to compile separate ASM files that need preprocessing)
cc1 -E, the "preprocessor" for C code (input .c; output .i. Only a separate process if -fno-integrated-cpp is passed, which is rare. Note that the cpp program in PATH is never called, even though it is provided by GCC - rather, it calls this. If -E is passed, the driver stops after this)
cc1, the "compiler" proper (input (usually) .c or (rarely) .i; output .s. If -S is passed, the driver stops after this; if -fsyntax-only is passed, this stage doesn't even complete)
(For other languages, replace cc1 with cc1plus, cc1d, cc1obj, f951, gnat1, etc. Note that the different drivers like g++, gdc, etc. only affect what extra libraries are linked by default)
as, the "assembler" (input .s; output .o. This is looked up in PATH; it is shipped as part of Binutils, not GCC. If -c is passed, the driver stops here)
collect2, the "linker" wrapper (supposedly this has something to do with constructors, and potentially calls ld twice, but in practice I've never seen it. Just think of it as forwarding all its arguments to ld, even if you have constructors normally)
ld, the "linker" proper (input .o or others (assumed to be libraries); output executable or shared library. Like as, this is actually part of Binutils, not GCC, so it is looked up in PATH)
The driver has a lot of logic, so it is important that you use it. Notably, you should never call as or ld yourself, since that will omit arguments that rely on the driver's sense of "exact current platform".
Now, getting to your specific question:
Ignoring irrevelant arguments and simplifying paths, the ld call ends up looking like:
ld -o foo Scrt1.o crti.o crtbeginS.o foo.o -lgcc -lgcc_s -lc -lgcc -lgcc_s crtendS.o crtn.o
The various "crt" loose object files are a mixture of parts of GLIBC and GCC, needed to support the C runtime (note that there are others as well; which are linked depends on arguments). The gcc and gcc_s libraries are needed to run code on the platform at all; they are repeated because they rely on the c library which also relies on them.
Since -lc is passed by default (regardless of language), the printf symbol can be resolved. Notably, -lm, -lrt, -lpthread and others are not passed by default, so other symbols from differents parts of the C library will not be resolved unless you pass them manually.
All of this is completely independent of what headers are included.
That your program compiles without a header present means that the compiler settings were lenient. You should still get a warning though. The reason that your program links is that the C standard library, which contains the code of the function printf, is linked automatically. Almost every C program needs it because input and output, or generally interaction with peripherals, which that library handles, are the general means of generating a "side effect", an effect outside the program. The opposite is so uncommon that one must make the wish to not link with it explicit.
So why does your compiler accept a call to a function which has not been declared?
C emerged at a time when programs were much smaller and software development as an engineering discipline didn't formally exist:
Four years later [i.e., in 1978], as a still-junior faculty member, I tried to get my colleagues [...] to create an undergraduate computer-science degree. A senior mechanical engineer of forbidding mien snorted surely not: Harvard had never offered a degree in automotive science, why would we create one in computer science? I waited until I had tenure before trying again (and succeeding) in 1982. -Harry R. Lewis
That was about 10 years after Denis Ritchie had started to develop this versatile new programming language, the successor to B. The problems involved in creating and maintaining large programs back then were simply not as pressing and not as well-understood as they are, perhaps, today.
Among the many things that help us today, at least in most compiled languages, is strong typing. Every identifier we use is declared with a static type. But the importance and benefits of that were not that obvious in the 1970s, and early C permitted mixing and matching integers and pointers at will. It's all numbers, right? And a function is just a name for a jump address, right? The user will know what to put on the stack, and the function will read it off the stack — I really don't see a problem here ;-). This attitude brought us functions like printf().
After this stage-setting we are slowly getting to the point. Because a function is just a jump address, no function declaration needed to be present in order to to call one. The assumed parameters were what you presented, and the presumed return type defaulted to int, which was often correct or at least didn't hurt. And for a long time C kept this backward compatibility. I think the C99 standard forbid the use of undeclared identifiers, and the standard drafts for C11 and C21 both say:
An identifier is a primary expression, provided it has been declared as designating an object (in which case it is an lvalue) or a function (in which case it is a function designator)91
Footnote 91 says "Thus, an undeclared identifier is a violation of the syntax." (All emphasis by me.)
All compilers I tried compile it anyway (with a warning), perhaps because some ancient code that still gets compiled frequently depends on it.

can I edit lines of code using gdb and is it also possible to save to actual source file and header file while in same debug session? linux

I have this program called parser I compiled with -g flag this is my makefile
parser: header.h parser.c
gcc -g header.h parser.c -o parser
clean:
rm -f parser a.out
code for one function in parser.c is
int _find(char *html , struct html_tag **obj)
{
char temp[strlen("<end")+1];
memcpy(temp,"<end",strlen("<end")+1);
...
...
.
return 0;
}
What I like to see when I debug the parser or something can I also have the capability to change the lines of code after hitting breakpoint and while n through the code of above function. If its not the job of gdb then is there any opensource solution to actually changing code and possible saving so when I run through the next statement in code then changed statement before doing n (possible different index of array) will execute, is there any opensource tool or can it be done in gdb do I need to do some compiling options.
I know I can assign values to variables at runtime in gdb but is this it? like is there any thing like actually also being capable of changing soure
Most C implementations are compiled. The source code is analyzed and translated to processor instructions. This translation would be difficult to do on a piecewise basis. That is, given some small change in the source code, it would be practically impossible to update the executable file to represent those changes. As part of the translation, the compiler transforms and intertwines statements, assigns processor registers to be used for computing parts of expressions, designates places in memory to hold data, and more. When source code is changed slightly, this may result in a new compilation happening to use a different register in one place or needing more or less memory in a particular function, which results in data moving back or forth. Merging these changes into the running program would require figuring out all the differences, moving things in memory, rearranging what is in what processor register, and so on. For practical purposes, these changes are impossible.
GDB does not support this.
(Apple’s developer tools may have some feature like this. I saw it demonstrated for the Swift programming language but have not used it.)

Is there a way to detect, a C-file is compiled directly into an executable?

I'd like to include a testing main-function in some of the C-files -- to allow them to be compiled into standalone little programs to independently test various functions.
But I don't want these multiple main-functions included in the real (big) executable/library.
Obviously, I can use my own define, such as -DINCLUDE_TEST_MAIN, but it occurred to me, that clang may already be telling me on its own. Somehow...
So, is there any way for the compiled code to detect, when it is compiled directly into an executable vs. when an object-file is being produced (with the -c flag)?
The solution needn't be universal -- I'm quite certain, a universal one does not exist -- my main compiler is clang...
I don't see a better solution than -DINCLUDE_TEST_MAIN. Probably you could create some fancy command line that would strip main out of the object file if you don't need it, but I think the -D thing is the best way to go.
I don't really get what you mean with »but it occurred to me, that clang may already be telling me on its own. Somehow...« - if you fear a name clash, than just take a name, clang will definitively not use, like -DMIKHAIL_T_INCLUDE_TEST_MAIN; if this is not what you meant, then you should clarify that point.
Or, besides stuff.c, you could create stuff.main.c and test compile like so:
gcc stuff.c stuff.main.c -o stuff.test
(effectively moving main out of the file.)
You could (but shouldn't) use the -Xlinker -zmuldefs option.
I'm posting this answer just as extra information as I don't think this method is the preferred one by far, but I find interesting knowing about it and it strongly relates to this very interesting question you asked.
This way the linker will ignore multiple definitions and it will take the first one it found in the first .c file.
I must stress though that I think that -zmuldefs should not be used, except in extreme cases where you know what you are doing and you have no better option.
I see the -Dxxx solution mentioned above as a far superior and it will also require extra parameters to the build (the only arguable minor advantage here is that you don't need to change parameters to clang by using the -zmuldefs aproach, gaining some transparency but I don't think this is actually any advantage as explained bellow).
And when you see a macro you show clear intention to the programmer that will maintain the code later. As opposed as looking for main() and finding it in thousands of files, knot knowing where it is actually used, except when you compile and debug.
Also by using the -zmuldefs you will need to pay attention in the order you write the files, because it will matter.
That said, if I have 2 files, lets say x1.c and x2.c, where
x1.c is:
#include <stdio.h>
#include "x2.h"
int x();
int main() {
x();
}
int x() {
static int isRecursiveCall_ = 0;
printf("Hello, World 1!\r\n");
if (!isRecursiveCall_) {
isRecursiveCall_ = 1;
main();
x3();
isRecursiveCall_ = 0;
}
return 0;
}
and x2.c is:
#include <stdio.h>
#include "x2.h"
int main() {
printf("Hello, World 2!\r\n");
x3();
return 0;
}
int x3() {
printf("Hello, World 3!\r\n");
return 0;
}
Where x2.h is
#ifndef LINKTEST_X_H
#define LINKTEST_X_H
int main2();
int x2();
int x3();
#endif //LINKTEST_X_H
You can call clang x2.c -Xlinker -zmuldefs -o test.o and ./test.o will the output will be:
Hello, World 2!
Hello, Wolrd 3!
And then you can call clang x1.c x2.c -Xlinker -zmuldefs -o prog.o and ./prog.o and the result is:
Hello, World 1!
Hello, World 1!
Hello, World 3!
As desired.
Again, the only advantage here is not having to know when to use the -Dxxx parameter and there are many drawbacks.
In my example by using the -Dxxx method you would have to make it clear in the code what function you would be using. For instance if I change x2.c as follows:
#include <stdio.h>
#include "x2.h"
#ifdef _UNIT_TEST_ENABLED
int main() {
printf("Hello, World 2!\r\n");
x3();
return 0;
}
#endif
int x3() {
printf("Hello, World 3!\r\n");
return 0;
}
Then when I build x2.c standalone version you'll need to call clang x2.c -DUNIT_TEST_ENABLED -o test.o, and when building all then you just have to call clang x1.c x2.c -o prog.o.
This make the intentions clear both in the code and in the build script and also allow to put the files in any order. Far better. Even better if you separate your unit tests in another file as also mentioned in previous answers.
In the end, having to know explicit build parameters and having an explicit #define or separate files, which is what you first wanted to avoid, is actually the best thing to do. There are times where transparency is good but others where it is evil.
An .o-file is just a pile of machine code with a symbol table. They hold little other information.
What you can do is link them into different configurations. In your Makefile:
OBJS = some_file.o other_file.o
test_prog: $(OBJS) test_main.o
real_prog: $(OBJS) real_main.o
all: test_prog real_prog
The same functionality is available in other build tools, in case you use something other than make.
Now you are re-using the same object files to put together different objects. Linking is a relatively fast operation compared to compilation.
Note that this doesn't save you any disk space. I can't tell from your question if that was a concern.
If disk space is an issue, you can use the age-old trick of making a symlink to the main executable with a different name, then looking at argv[0] and changing the behaviour of the program depending on the name it is being called with.

Replacing startup files in a bare metal embedded system

I'm using the gcc cross-compiler for the LEON2 processor (Sparc v8), however after inspecting the startup code I wanted to provide my own for various reason (we are working in space applications and the code is not up to standards and its also very complicated in my opinion. Also I find this part very interesting). And since we are only using the ported newlib and no RTOS I thought this could be doable without too much work.
In order to do this I compiled my application with -nostartfiles and so I provided my own start entry point etc... The code was taken from the actual startup files with some modification from me.
This seems to work however I'm now stuck at the part where I'm supposed to initialize the newlib.
In order to understand how this works I compiled with and without the -nostartfiles flag to find the differences. I'm only showing the last part here when gcc calls collect2 since the other parts are not relevant.
This is a simple application built without -nostartfiles
/opt/sparc-elf-4.4.2/bin/../libexec/gcc/sparc-elf/4.4.2/collect2
/opt/sparc-elf-4.4.2/bin/../lib/gcc/sparc-elf/4.4.2/../../../../sparc-elf/lib/locore_mvt.o
/opt/sparc-elf-4.4.2/bin/../lib/gcc/sparc-elf/4.4.2/../../../../sparc-elf/lib/crt0.o
/opt/sparc-elf-4.4.2/bin/../lib/gcc/sparc-elf/4.4.2/crti.o
/opt/sparc-elf-4.4.2/bin/../lib/gcc/sparc-elf/4.4.2/crtbegin.o
/opt/sparc-elf-4.4.2/bin/../lib/gcc/sparc-elf/4.4.2/../../../../sparc-elf/lib/pnpinit_simple.o
-L/opt/sparc-elf-4.4.2/bin/../lib/gcc/sparc-elf/4.4.2
-L/opt/sparc-elf-4.4.2/bin/../lib/gcc
-L/opt/sparc-elf-4.4.2/bin/../lib/gcc/sparc-elf/4.4.2/../../../../sparc-elf/lib
/tmp/ccCfxzLR.o
-lgcc
--start-group -lc -lgcc -lleonbare --end-group
-lgcc
--start-group -lc -lgcc -lleonbare --end-group
/opt/sparc-elf-4.4.2/bin/../lib/gcc/sparc-elf/4.4.2/crtend.o
/opt/sparc-elf-4.4.2/bin/../lib/gcc/sparc-elf/4.4.2/crtn.o
Now after some tweaking I managed to get something similar for my application.
/opt/sparc-elf-4.4.2/bin/../libexec/gcc/sparc-elf/4.4.2/collect2
-o debug/BSW
-L../../drv_EEPROM/SRC/debug
-L/opt/sparc-elf-4.4.2/bin/../lib/gcc/sparc-elf/4.4.2
-L/opt/sparc-elf-4.4.2/bin/../lib/gcc
-L/opt/sparc-elf-4.4.2/bin/../lib/gcc/sparc-elf/4.4.2/../../../../sparc-elf/lib
-Map BSW_memory.map
debug/trap_table.o
debug/trap_reset.o
debug/trap_access_exception.o
debug/trap_fp_disabled.o
debug/trap_window_overflow.o
debug/trap_window_underflow.o
debug/cpu_info.o
debug/cpu_init.o
debug/init_hooks.o
debug/start_sequence.o
debug/main.o
debug/pnpinit_simple.o
debug/register_atexit.o
-lDrvEEPROM
/opt/sparc-elf-4.4.2/bin/../lib/gcc/sparc-elf/4.4.2/crti.o
/opt/sparc-elf-4.4.2/bin/../lib/gcc/sparc-elf/4.4.2/crtbegin.o
-lgcc -lc -lgcc -lleonbare -lgcc -lc -lgcc -lleonbare
/opt/sparc-elf-4.4.2/bin/../lib/gcc/sparc-elf/4.4.2/crtend.o
/opt/sparc-elf-4.4.2/bin/../lib/gcc/sparc-elf/4.4.2/crtn.o
-Tlinkbootram
I've debugged this in GDB and I can see my code is properly executed, at least it seems, up until the main is called. My main is very simple with just a call to printf. Sadly this call to printf never returns and seems to loop forever. So I guess some initialization is not correctly done.
I'm trying to understand how things are done but all this stuff is quite obscure and not a lot of documentation seems available. I'm still learning so I probably am missing something important but maybe some knowledgeable people on here could provide some way forward? Thanks.
There is some good documentation on porting Newlib here, including implementation of the C Runtime initialisation. The CRT tasks include:
Set up the target platform in a consistent state. For example setting up appropriate exception vectors.
Initialize the stack and frame pointers
Invoke the static data initialization including C++ constructors for static objects.
Carry out any further platform specific initialization.
Call the C main function.
Call C++ static object destructors on exit from main().
Exit with the return code supplied on exit from main().
It is not clear how you have determined than printf() is not returning, but there are a number of possibilities such as insufficient stack, or in fact it does return but when the printf() call returns main() will exit and what happens then is up to your C runtime - presumably it will loop indefinitely? If exit from main() disables interrupts and your printf() output is buffered and interrupt driven, you might not see any output. Another possibility is that as the last statement in main() the printf() call return may be optimised by "merging" it with the main() return. I suggest you at least place a for(;;); at the end of main() to block it from returning.
printf() itself is somewhat heavyweight as a starting point for test; it required significant stack space and working support for stdout and syscalls. I'd perhaps start with something lower level and simpler with fewer or no dependencies on the library such as toggling a GPIO or unbuffered serial output.

Compiling without libc

I want to compile my C-code without the (g)libc. How can I deactivate it and which functions depend on it?
I tried -nostdlib but it doesn't help: The code is compilable and runs, but I can still find the name of the libc in the hexdump of my executable.
If you compile your code with -nostdlib, you won't be able to call any C library functions (of course), but you also don't get the regular C bootstrap code. In particular, the real entry point of a program on Linux is not main(), but rather a function called _start(). The standard libraries normally provide a version of this that runs some initialization code, then calls main().
Try compiling this with gcc -nostdlib -m32:
// Tell the compiler incoming stack alignment is not RSP%16==8 or ESP%16==12
__attribute__((force_align_arg_pointer))
void _start() {
/* main body of program: call main(), etc */
/* exit system call */
asm("movl $1,%eax;"
"xorl %ebx,%ebx;"
"int $0x80"
);
__builtin_unreachable(); // tell the compiler to make sure side effects are done before the asm statement
}
The _start() function should always end with a call to exit (or other non-returning system call such as exec). The above example invokes the system call directly with inline assembly since the usual exit() is not available.
The simplest way to is compile the C code to object files (gcc -c to get some *.o files) and then link them directly with the linker (ld). You will have to link your object files with a few extra object files such as /usr/lib/crt1.o in order to get a working executable (between the entry point, as seen by the kernel, and the main() function, there is a bit of work to do). To know what to link with, try linking with the glibc, using gcc -v: this should show you what normally comes into the executable.
You will find that gcc generates code which may have some dependencies to a few hidden functions. Most of them are in libgcc.a. There may also be hidden calls to memcpy(), memmove(), memset() and memcmp(), which are in the libc, so you may have to provide your own versions (which is not hard, at least as long as you are not too picky about performance).
Things might get clearer at times if you look at the produced assembly (use the -S flag).

Resources