Some general C questions - c

I am trying to fully understand the process pro writing code in some language to execution by OS. In my case, the language would be C and the OS would be Windows. So far, I read many different articles, but I am not sure, whether I understand the process right, and I would like to ask you if you know some good articles on some subjects I couldnĀ“t find.
So, what I think I know about C (and basically other languages):
C compiler itself handles only data types, basic math operations, pointers operations, and work with functions. By work with functions I mean how to pass argument to it, and how to get output from function. During compilation, function call is replaced by passing arguments to stack, and than if function is not inline, its call is replaced by some symbol for linker. Linker than find the function definition, and replace the symbol to jump adress to that function (and of course than jump back to program).
If the above is generally true and I get it right, where to final .exe file actually linker saves the functions? After the main() function? And what creates the .exe header? Compiler or Linker?
Now, additional capabilities of C, today known as C standart library is set of functions and the declarations of them, that other programmers wrote to extend and simplify use of C language. But these functions like printf() were (or could be?) written in different language, or assembler. And there comes my next question, can be, for example printf() function be written in pure C without use of assembler?
I know this is quite big question, but I just mostly want to know, wheather I am right or not. And trust me, I read a lots of articles on the web, and I would not ask you, If I could find these infromation together on one place, in one article. Insted I must piece by piece gather informations, so I am not sure if I am right. Thanks.

I think that you're exposed to some information that is less relevant as a beginning C programmer and that might be confusing you - part of the goal of using a higher level language like this is to not have to initially think about how this process works. Over time, however, it is important to understand the process. I think you generally have the right understanding of it.
The C compiler merely takes C code and generates object files that contain machine language. Most of the object file is taken by the content of the functions. A simple function call in C, for example, would be represented in the compiled form as low level operators to push things into the stack, change the instruction pointer, etc.
The C library and any other libraries you would use are already available in this compiled form.
The linker is the thing that combines all the relevant object files, resolves all the dependencies (e.g., one object file calling a function in the standard library), and then creates the executable.
As for the language libraries are written in: Think of every function as a black box. As long as the black box has a standard interface (the C calling convention; that is, it takes arguments in a certain way, returns values in a certain way, etc.), how it is written internally doesn't matter. Most typically, the functions would be written in C or directly in assembly. By the time they make it into an object file (or as a compiled library), it doesn't really matter how they were initially created, what matters is that they are now in the compiled machine form.
The format of an executable depends on the operating system, but much of the body of the executable in windows is very similar to that of the object files. Imagine as if someone merged together all the object files and then added some glue. The glue does loading related stuff and then invokes the main(). When I was a kid, for example, people got a kick out of "changing the glue" to add another function before the main() that would display a splash screen with their name.
One thing to note, though is that regardless of the language you use, eventually you have to make use of operating system services. For example, to display stuff on the screen, to manage processes, etc. Most operating systems have an API that is also callable in a similar way, but its contents are not included in your EXE. For example, when you run your browser, it is an executable, but at some point there is a call to the Windows API to create a window or to load a font. If this was part of your EXE, your EXE would be huge. So even in your executable, there are "missing references". Usually, these are addressed at load time or run time, depending on the operating system.

I am a new user and this system does not allow me to post more than one link. To get around that restriction, I have posted some idea at my blog http://zhinkaas.blogspot.com/2010/04/how-does-c-program-work.html. It took me some time to get all links, but in totality, those should get you started.

The compiler is responsible for translating all your functions written in C into assembly, which it saves in the object file (DLL or EXE, for example). So, if you write a .c file that has a main function and a few other function, the compiler will translate all of those into assembly and save them together in the EXE file. Then, when you run the file, the loader (which is part of the OS) knows to start running the main function first. Otherwise, the main function is just like any other function for the compiler.
The linker is responsible for resolving any references between functions and variables in one object file with the references in other files. For example, if you call printf(), since you do not define the function printf() yourself, the linker is responsible for making sure that the call to printf() goes to the right system library where printf() is defined. This is done at compile-time.
printf() is indeed be written in pure C. What it does is call a system call in the OS which knows how to actually send characters to the standard output (like a window terminal). When you call printf() in your program, at compile time, the linker is responsible for linking your call to the printf() function in the standard C libraries. When the function is passed at run-time, printf() formats the arguments properly and then calls the appropriate OS system call to actually display the characters.

Related

Remote update-able function or code in a statically linked firmware of embedded device (microcontroller) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Is it possible to remote update or patch only certain piece of code to an embedded device (microcontroller)?
I am writing a bare metal C program on a microcontroller. Lets say I have main program which comprises of function A(), B(), C() and D() each in their own .c file.
I would like to do a remote update (patching) of only B() which has its own custom section in the main program so that the address is fixed and known.
The thing that puzzles me is how do I produce an updated executable of B()? Given that it relies on standard C library or other global variables from the main program. I need to resolve all the symbols from the main program.
Would appreciate any suggestions or reference to other thread if this question has been asked before (I tried to search but couldn't find any)
Solutions:
from Russ Schultz: Compile the new B(), and link it with the symbol file previously generated from the main program using gcc --just-symbols option. The resultant elf will contain just B() that assumes all these other symbols are there. (I haven't tried it but this is the concept that I was looking for)
Compile the whole program with the new B(), and manually take only the B() part binary from the main program (because its section address and size are known). Send B() binary to the device for remote update (not efficient as it involves many manual works).
Dynamic loading and linking requires run-time support - normally provided by an OS. VxWorks for example includes support for that (although the code is normally loaded into RAM over a network or mass-storage file-system rather then Flash or other re-writable ROM).
You could in theory write your own run-time linker/loader. However for it to work, the embedded firmware must contain a symbol table in order to complete the link. The way it works in VxWorks, is the object code to be loaded is partially linked and contains unresolved symbols that are completed by the run-time linker/loader by reference to the embedded symbol table.
Other embedded operating systems that support dynamic loading are:
Precise/MQX
Nucleus
Another approach that does not require a symbol table is to provide services via a software interrupt API (as used in PC-BIOS and MS-DOS). The loaded module will necessarily have a restricted access to services provided by the API, but because they are interrupts, the actual location of such services does not need to be known to the loadable module, not explicitly linked at run-time.
There is an article on dynamically loading modules in small RTOS systems on Embedded.com:
Bring big system features to small RTOS devices with downloadable app modules.
The author John Carbone works for Express Logic who produce ThreadX RTOS, which gives you an idea of the kind of system it is expected to work on, although the article and method described is not specific to ThreadX.
Some approach like this requires that the programmer manually manages all code allocation with their own custom segments. You'd have to know the fixed address of the function and it can't be allowed to grow beyond a certain size.
The flash memory used will dictate the restrictions, namely how large an area do you need to erase before programming. If you can execute code from eeprom/data flash then that's the obvious choice.
Library calls etc are irrelevant as the library functions are most likely stored elsewhere. Or in the rare case where they are inlined, they'll be small. But you might have to write the function in assembler, since C compiler-generated machine code may screw up the calling convention or unexpectedly overwrite registers if taken out of the expected context.
Because all of the above is fairly complex, the normal approach is to only modify const variables, rather than code, and keep those in eeprom/data flash, then have your program act based on those values.
I'm assuming you're talking about bare metal with my answer.
First off, linking a new function B() with the original program is relatively simple, particularly with GCC. LD can take a 'symbol file' as input using the --just-symbols option. So, scrape your map file, create the symbol file and use it as an input to your new link. Your resultant elf will contain just your code that assumes all these other symbols are there.
At that point, compile your new function B(), which should be a different name than B() (so we'll choose B_()). It should have the exact same signature as B() or things won't work right. You have to compile with the same headers, etc. that your original code was compiled with or it likely won't work.
Now, depending on how you've architected your original program, life can be easy or a real mess.
If you make your original program with the idea of patching in mind, then the prep is relatively trivial. Identify which functions you might want to patch and then call them through function pointers, e.g.:
void OriginalB(void)
{
//Original implementation of B goes here
}
void (B*)(void) = OriginalB;
void main(void)
{
B(); //this calls OriginalB() through the function pointer B. Once you
//patch the global function pointer B to point to B_(), then this
//code will call your new function.
}
Now your patch program is the original program linked with your B_(), but you somehow have to update the global function pointer B to point to B_() (rather than OriginalB())
Assuming you can use your new elf (or hex file) to update your device, it's pretty easy to just go modify those to change the value of B or assign the new function pointer directly in your code.
If not, then whatever method of injection you need to do also needs to inject a change to the global pointer.
If you didn't prep your original program, then it can be a real bear (but doable) to go modify references to B() to instead jump to your new B_(). It might get super tricky if your new function is too far away for a relative jump, but still doable in theory. I've never actually done it. ;)
If you're trying to patch a ROM, you almost have to have prepped the original ROMmed program to use function points for potential patch points. Or have some support in the ROM hardware to allow limited patching (usually it's just a few locations it will let you patch).
Some of the details may be incorrect for GCC (I use the Keil tools in my professional flow), but the concept is the same. It's doable. It's fragile. There's no standard way of doing this and it's highly tool and application dependent.

how to catch calls with LD_PRELOAD when unknown programs may be calling execve without passing environment

I know how to intercept system calls with LD_PRELOAD, that occur in compiled programs I may not have source for. For example, if I want to know about the calls to int fsync(int) of some unknown program foobar, I compile a wrapper
int fsync(int)
for
(int (*) (int))dlsym(RTLD_NEXT,"fsync");
into a shared library and then I can set the environment variable LD_PRELOAD to that and run foobar. Assuming that foobar is dynamically linked, which most programs are, I will know about the calls to fsync.
But now suppose there is another unknown program foobar1 and in the source of that program was a statement like this:
execve("foobar", NULL, NULL)
that is, the environment was not passed. Now the whole LD_PRELOAD scheme breaks down?
I checked by compiling the statemet above into foobar1, when that is run, the calls from foobar are not reported.
While one can safely assume most modern programs are dynamically linked, one cannot at all assume how they may or may not be using execve?
So then, the whole LD_PRELOAD scheme, which everybody says is such a great thing, is not really working unless you have the source to the programs concerned, in which case you can check the calls to execve and edit them if necessary. But in that case, there is no need for LD_PRELOAD, if you have sources to everything. LD_PRELOAD is specifically, supposed to be, useful when you don't have sources to the programs you are inspecting.
Where am I wrong here - how can people say, that LD_PRELOAD is useful for inspecting what unknown programs are doing??
I guess I could also write a wrapper for execve. In the wrapper, I add to the original envp argument, one more string: "LD_PRELOAD=my library" . This "seems" to work, I checked on simple examples.
I am not sure if I should be posting an "answer" which may very easily exceed my level of C experience.
Can somebody more experienced than me comment if this is really going to work in the long run?

When I debug a C program with gdb, and key in 'p system', what exactly do I get?

Before I go deep into my questions, I need to confess that I am still fairly inexperienced to this subject, and am confused over quite a number of concepts, so please bear with me if my manner of asking those questions seems unorganized.
I recently learnt that as standard C library would be loaded into every C program we compiled (is this because we have #include at the beginning of the source file?[quesiton1]), we would have its functions loaded into the memory. So, I would know that the system() function had already been loaded and stored somewhere in the memory, and then I was made know that I could find the exact address of where the system() function was stored by debugging a random C program with gdb, and issuing the command 'p system', which would print out the address of the function. I understand that 'p' is used to print variable in gdb, and 'system' in this case probably indicates the address of the system() function, so it seems to make sense to do so, but then I think to myself, wait a second, it does not appear that I have used the system() function anywhere in my code, why would the inventor of gdb include such a variable for me to print out the address of some function that I don't even use? and does this imply that the address of every function in stand C library can be found out in the same fashion? and they all have a corresponding variable name in gdb? [question2]
One more question unrelated to stuff I talked above is whether functions like system(), execve() and many others are specific to Linux OS, or they are also used in Windows OS? [question3]
Hope that you guys can help me out. Thanks in advance!
The standard C library is linked with every program because it's necessary for it to be there to be able to run your program. There's a lot of things happening in your program before your main function gets called and after it returns, the standard library takes care of this. It also provides you with most of the standard functions you can call. You can compile things without a standard library, but that's an advanced topic. This is pretty much unrelated to #include.
Gdb can see system with p because it prints more than just variables. It prints anything that is in scope. system just happens to be a symbol that's visible to you in that scope. You could print any symbol that's visible to you, including all the globally visible variables and functions in libc and your program. Symbols in this context means "names of various things that need to be findable by programs and other libraries", this includes all functions, variables, section boundaries and many other things that the compiler/linker/runtime/debugger need to find to do its job.
Usually the standard library gets linked dynamically, which means that every program has the exact same copy of the library. In that case all symbols in it will be visible to your program because there's no reason to exclude them. If you link your program statically only the necessary parts of libc will be included and you would probably not see the system symbol unless you actually use that function.

Is it possible for a program written in C to download an external function and treat this external function as a compiled and linked shared object?

I am working on a program in C, and I am having trouble with libconfig.h. Because of this, I think if I could have my program download an external function from the Internet (using libcurl.h) and have my program treat it as a compiled and linked shared object, that would be perfect. It would need to work on all desktop platforms (Windows, Mac, and Linux), so no .dll's, and would have to be downloaded by the program, treated as a function, and then get deleted by the program. So, my question is: is that possible in C?
The reason that I need to download it separately is because the function would need to be updated regularly, and requiring the user to download a new version of the program regularly would defeat the purpose of the program.
Well the closest to what you ask for would be this
Download .so/.dll using curl
Dynamically load .so/.dll into your process
set up function pointer in your process to point to a function in .so/.dll
On Windows:
HMODULE handle = LoadLibrary("mylib.dll");
if (handle)
myfunc = GetProcAddress(handle, "myfunc");
To unload call
FreeLibrary(handle)
It decreases ref count, and the DLL is actually unloaded when ref count hits 0.
On Linux, check this post:
How do I load a shared object in C++?
You can't just treat it as compiled; you would have to do one of two things:
Actually compile it on the fly, then load it as a dynamic library, which requires ensuring that there is a compiler on the system and will probably cause an unholy mess of errors on the user end.
Build your own C parser to interpret the external function, which is no small feat.
Far simpler solution: just write a function that works and compile platform-specific versions of it into your binary (or a library, if you prefer) before shipping the product.
You could link a Python interpreter into your program and have it execute a Python version of your function.
This approach would actually work with different languages, such as Java, Ruby, etc.

hidden routines linked in c program

Hullo,
When one disasembly some win32 exe prog compiled by c compiler it
shows that some compilers links some 'hidden' routines in it -
i think even if c program is an empty one and has a 5 bytes or so.
I understand that such 5 bytes is enveloped in PE .exe format but
why to put some routines - it seem not necessary for me and even
somewhat annoys me. What is that? Can it be omitted? As i understand
c program (not speaking about c++ right now which i know has some
initial routines) should not need such complementary hidden functions..
Much tnx for answer, maybe even some extended info link, cause this
topic interests me much
//edit
ok here it is some disasembly Ive done way back then
(digital mars and old borland commandline (i have tested also)
both make much more code, (and Im specialli interested in bcc32)
but they do not include readable names/symbols in such dissassembly
so i will not post them here
thesse are somewhat readable - but i am not experienced in understending
what it is ;-)
https://dl.dropbox.com/u/42887985/prog_devcpp.htm
https://dl.dropbox.com/u/42887985/prog_lcc.htm
https://dl.dropbox.com/u/42887985/prog_mingw.htm
https://dl.dropbox.com/u/42887985/prog_pelles.htm
some explanatory comments whats that heere?
(I am afraid maybe there is some c++ sh*t here, I am
interested in pure c addons not c++ though,
but too tired now to assure that it was compiled in c
mode, extension of compiled empty-main prog was c
so I was thinking it will be output in c not c++)
tnx for longer explanations what it is
Since your win32 exe file is a dynamically linked object file, it will contain the necessary data needed by the dynamic linker to do its job, such as names of libraries to link to, and symbols that need resolving.
Even a program with an empty main() will link with the c-runtime and kernel32.dll libraries (and probably others? - a while since I last did Win32 dev).
You should also be aware that main() is only the entry point of your program - quite a bit has already gone on before this point such as retrieving and tokening the command-line, setting up the locale, creating stderr, stdin, and stdout and setting up the other mechanism required by the c-runtime library such a at_exit(). Similarly, when your main() returns, the runtime does some clean-up - and at the very least needs to call the kernel to tell it that you're done.
As to whether it's necessary? Yes, unless you fancy writing your own program prologue and epilogue each time. There are probably are ways of writing minimal, statically linked applications if you're sufficiently masochistic.
As for storage overhead, why are you getting so worked up? It's not enough to worry about.
There are several initialization functions that load whenever you run a program on Windows. These functions, among other things, call the main() function that you write - which is why you need either a main() or WinMain() function for your program to run. I'm not aware of other included functions though. Do you have some disassembly to show?
You don't have much detail to go on but I think most of what you're seeing is probably the routines of the specific C runtime library that your compiler works with.
For instance there will be code enabling it to run from the entry point 'main' which portable executable format understands to call the main(char ** args) that you wrote in your C program.

Resources