binutils - kernel - "_binary" meaning? - c

I am reading xv6 lectures.
I have a file named initcode.S that is to be linked in the kernel.
Now two symbols are created that way :
extern char _binary_initcode_start[], _binary_initcode_size[];
inside a function.
The lecture says :
as part of the kernel build process, the linker embeds that binary that defines two special symbols, _binary_initcode_starcode_size, indicating the location and size of the binary.
I understand that binutils is getting the address and the size of this assembled code.
I wonder about the notation : is it default ? my searches didn't prove that clearly.
_binary -> it is originally an assembly code
_initcode -> the name of my file
_start -> the parameter i am interested in.
It would imply that any assembly code compiled would have those variables too.
I have no proof of that, though.
The question is :
is _binary_myAsmFileHere_myParameterhere the default variable structure binutils give to the assembly file to export their address, size and so on ?
Could someone tell me if my assumption is right and if it is better than that : the rule
Thanks

Strangely enough, it doesn't seem to be documented in the ld manual. However, man objcopy does say this:
You can access this binary data inside a program by referencing the
special symbols that are created by the conversion process. These
symbols are called _binary_objfile_start, _binary_objfile_end and
_binary_objfile_size. e.g. you can transform a picture file into an object file and then access it in your code using these symbols.
Apparently the same logic is used by ld when embedding binary files.
Notice that the Makefile for xv6 contains this line for linking the kernel:
$(LD) $(LDFLAGS) -T kernel.ld -o kernel entry.o $(OBJS) -b binary initcode entryother
As you can see, it uses -b binary to embed the files initcode and entryother, so the above symbols will be defined during this process.

when a .global variable is defined in an assembly file, for a C file to be able to reference that variable, the C file has to prepend a '_' to the variable name. This is so the linker can 'link' the name in the C file with the name in the assembly file.

Related

Where are avr-gcc libraries stored?

I'm trying to locate the .c files that are related to the #include header files in avr.
I want to have a look at some of the standard libraries that are defined in the avr-gcc library, particularly the PORT definitions contained in <avr/io.h>. I searched through the library in /usr/lib/avr/include/avr and found the header file, however what I am looking for is the .c file. Does this file exist? If so, where can I find it? If not, what is the header file referencing?
The compiler provided libraries are precompiled object code stored in static libraries. In gcc, libraries conventionally the extension .a (for "archive" for largely historic reasons), and the prefix "lib".
At build time, the linker will search the library archives to find the object-code modules necessary to resolve referenced to library symbols. It extracts the required modules and links them to the binary image being built.
In gcc a library libXXX.a is typically linked using the command line switch -lXXX - so the libXXX.a naming convention is important in that case. So for example the standard C library libc.a is looking linked by the switch -lc.
So to answer your question, there are normally no .c files for the compiler provided libraries provided with the toolchain. The library need not even have been written by in C.
That said, being open source, the source files (.c or otherwise) will be available from the repositories of the various libraries. For example, for the standard C library: https://www.nongnu.org/avr-libc/.
For other AVR architecture and I/O support libraries, you might inspect the associated header files or documentation. The header files will typically have a boiler-plate comment with a project URL for example.
PORTB and other special function registers are usually defined as macros in headers provided by avr-libc. Find your include/avr directory (the one that contains io.h). In that directory, there should be many other header files. As an example, iom328p.h contains the following line that defines PORTB on the ATmega328P:
#define PORTB _SFR_IO8(0x05)
If you are also looking for the libraries that are distributed as .a files, you should run avr-gcc -print-search-dirs.
There are several ways to find out where the system headers are located and which are included:
avr-gcc -v -mmcu=atmega8 foo.c ...
With option -v, GCC will print (amongst other stuff) whch include paths it is using. Check the output on a shell / console, where GCC will print the search paths:
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/gcc/avr/5.4.0/include
/usr/lib/gcc/avr/5.4.0/include-fixed
/usr/lib/gcc/avr/5.4.0/../../../avr/include
The last location is for AVR-LibC, which provides avr/io.h. Resolving the ..s, that path is just /usr/lib/avr/include. These paths depend on how avr-gcc was configured and installed, hence you have to run that command with your installation of avr-gcc.
avr-gcc -H -mmcu=atmega8 foo.c ...
Suppose the C-file foo.c reads:
#include <avr/io.h>
int main (void)
{
PORTD = 0;
}
for an easy example. With -H, GCC will print out which files it is actually including:
. /usr/lib/avr/include/avr/io.h
.. /usr/lib/avr/include/avr/sfr_defs.h
... /usr/lib/avr/include/inttypes.h
.... /usr/lib/gcc/avr/5.4.0/include/stdint.h
..... /usr/lib/avr/include/stdint.h
.. /usr/lib/avr/include/avr/iom8.h
.. /usr/lib/avr/include/avr/portpins.h
.. /usr/lib/avr/include/avr/common.h
.. /usr/lib/avr/include/avr/version.h
.. /usr/lib/avr/include/avr/fuse.h
.. /usr/lib/avr/include/avr/lock.h
avr-gcc -save-temps -g3 -mmcu=atmega8 foo.c ...
With DWARF-3 debugging info, the macro definitions will be recorded in the debug info and are visible in the pre-processed file (*.i for C code, *.ii for C++, *.s for pre-processed assembly). Hence, in foo.i we can find the definition of PORTD as
#define PORTD _SFR_IO8(0x12)
Starting from the line which contains that definition, scroll up until you find the annotation that tells in which file the macro definition happened. For example
# 45 "/usr/lib/avr/include/avr/iom8.h" 3
in the case of my toolchain installation. This means that the lines following that annotation follow line 45 of /usr/lib/avr/include/avr/iom8.h.
If you want to see the resolution of PORTD, scroll down to the end of foo.i which contains the pre-processed source:
# 3 "foo.c"
int main (void)
{
(*(volatile uint8_t *)((0x12) + 0x20)) = 0;
}
0x12 is the I/O address of PORTD, and 0x20 is the offset between I/O addresses and RAM addresses for ATmega8. This means the compiler may implement PORTD = 0 by means of out 0x12, __zero_reg__.
avr-gcc -print-file-name=libc.a -mmcu=...
Finally, this command will print the location (absolue path) of libraries like libc.a, libm.a, libgcc.a or lib<mcu>.a. The location of the library depends on how the compiler was configureed and installed, but also on command line options like -mmcu=.
avr-gcc -Wl,-Map,foo.map -mmcu=atmega8 foo.c -o foo.elf
This directs the linker to dump a "map" file foo.map where it reports which symbol will drag which module from which library. This is a text file that contains lines like:
LOAD /usr/lib/gcc/avr/5.4.0/../../../avr/lib/avr4/crtatmega8.o
...
LOAD /usr/lib/gcc/avr/5.4.0/avr4/libgcc.a
LOAD /usr/lib/gcc/avr/5.4.0/../../../avr/lib/avr4/libm.a
LOAD /usr/lib/gcc/avr/5.4.0/../../../avr/lib/avr4/libc.a
LOAD /usr/lib/gcc/avr/5.4.0/../../../avr/lib/avr4/libatmega8.a
libgcc.a is from the compiler's C runtime, and all the others are provided by AVR-LibC. Resolving the ..s, the AVR-LibC files for ATmega8 are located in /usr/lib/avr/lib/avr4/.

How to check the values of a struct from an image/binary file?

Is there anyway i can look into the values of a structure after compilation? objdump -td gives the function definitions and only the address where the structure is stored. The problem is i am getting a wrong address for one of the threads/functions in a structure when i run a program. The target mcu is lpc1347 (ARM Cortex-m3).
objdump parses object files (products of the compiler), which are relocatable (not executable) ELF files. At this stage, there is no such notion as the memory address these compiled pieces will run at.
You have the following possibilities:
Link your *.obj files into the final non-stripped (-g passed to compiler) executable ELF image and parse it using readelf.
Generate the linker map file by adding -Wl,-Map,file.map to your LDFLAGS and see the output sections and addresses your data is located at in the map file.
Use a debugger/gdb.

ld: access beyond end of merged section

i'm trying to link a simple c program on an arm debian machine (a raspberry pi) and when linking the ogject file the linker returns me the error in the subject.
my program is as simple as
simple.c:
int main(){
int a = 2;
int b = 3;
int c = a+b;
}
i compile it with
$>gcc -o simple.obj simple.c
and then link it with
$>ld -o simple.elf simple.obj
ld: simple.obj: access beyond end of merged section (33872)
i can't understand why...
if i try to read the elf file with objdump -d it doesn't manage to decompile the .text section (it only prints address, value, .word and again value preceded by 0x) but the binary data is the same as the one i get from the decompiled simple.obj.
the only difference is in the loading start (and consequent) addresses of the binary data: the elf file starts at 0x8280, the object file starts at 0x82a0.
what does all this mean?
EDIT:
this is the dump for the obj file: http://pastebin.com/YZ94kRk4
and this is the dump for the elf file: http://pastebin.com/3C3sWqrC
i tried compiling with -c option that makes gcc stop after assembly time (it already did the linking part) but now i have a different problem: it says that there is no _start section in my object file...
the new dumps are:
simple.obj: http://pastebin.com/t0TqmgPa
simple.elf: http://pastebin.com/qD35cnqw
You are misunderstanding the effect of the commands you ran. If you run:
$ gcc -o simple.obj simple.c
it already creates the program you want to run, it's already linked. You don't need to link it again, especially by running ld directly unless you know what you are doing. Even if its extension is obj, it doesn't matter, it's just the name of the file, but the content of the file is already a complete Linux program. So if you run:
$ ./simple.obj
it will execute your code.
You usually don't call ld directly, but instead you use gcc as a front-end to compile and link. This is because gcc takes care of linking also important libraries that you are not linking such as the startup code, and that's the reason why your second attempt resulted in "no _start section" or something like that.
Could you print the output of the objdump -d command?
Btw, notice that 33872 == 0x8450.
I am not familiar with raspberry PI's memory map, so if you'r following any tutorials about this or have some other resource to help me help you out - it would be great :)

How to find out *.c and *.h files that were used to build a binary?

I am building a project that builds multiple shared libraries and executable files. All the source files that are used to build these binaries are in a single /src directory. So it is not obvious to figure out which source files were used to build each of the binaries (there is many-to-many relation).
My goal is to write a script that would parse a set of C files for each binary and make sure that only the right functions are called from them.
One option seems to be to try to extract this information from Makefile. But this does not work well with generated files and headers (due to dependence on Includes).
Another option could be to simply browse call graphs, but this would get complicated, because a lot of functions are called by using function pointers.
Any other ideas?
You can first compile your project with debug information (gcc -g) and use objdump to get which source files were included.
objdump -W <some_compiled_binary>
Dwarf format should contain the information you are looking for.
<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
< c> DW_AT_producer : (indirect string, offset: 0x5f): GNU C 4.4.3
<10> DW_AT_language : 1 (ANSI C)
<11> DW_AT_name : (indirect string, offset: 0x28): test_3.c
<15> DW_AT_comp_dir : (indirect string, offset: 0x36): /home/auselen/trials
<19> DW_AT_low_pc : 0x82f0
<1d> DW_AT_high_pc : 0x8408
<21> DW_AT_stmt_list : 0x0
In this example, I've compiled object file from test_3, and it was located in .../trials directory. Then of course you need to write some script around this to collect related source file names.
First you need to separate the debug symbols from the binary you just compiled. check this question on how to do so:
How to generate gcc debug symbol outside the build target?
Then you can try to parse this file on your own. I know how to do so for Visual Studio but as you are using GCC I won't be able to help you further.
Here is an idea, need to refine based on your specific build. Make a build, log it using script (for example script log.txt make clean all). The last (or one of the last) step should be the linking of object files. (Tip: look for cc -o <your_binary_name>). That line should link all .o files which should have corresponding .c files in your tree. Then grep those .c files for all the included header files.
If you have duplicate names in your .c files in your tree, then we'll need to look at the full path in the linker line or work from the Makefile.
What Mahmood suggests below should work too. If you have an image with symbols, strings <debug_image> | grep <full_path_of_src_directory> should give you a list of C files.
You can use unix nm tool. It shows all symbols that are defined in the object. So you need to:
Run nm on your binary and grab all undefined symbols
Run ldd on your binary to grab list of all its dynamic dependencies (.so files your binary is linked to)
Run nm on each .so file youf found in step 2.
That will give you the full list of dynamic symbols that your binary use.
Example:
nm -C --dynamic /bin/ls
....skipping.....
00000000006186d0 A _edata
0000000000618c70 A _end
U _exit
0000000000410e34 T _fini
0000000000401d88 T _init
U _obstack_begin
U _obstack_newchunk
U _setjmp
U abort
U acl_extended_file
U bindtextdomain
U calloc
U clock_gettime
U closedir
U dcgettext
U dirfd
All those symbols with capital "U" are used by ls command.
If your goal is to analyze C source files, you can do that by customizing the GCC compiler. You could use MELT for that purpose (MELT is a high-level domain specific language to extend GCC) -adding your own analyzing passes coded in MELT inside GCC-, but you should first learn about GCC middle-end internal representations (Gimple, Tree, ...).
Customizing GCC takes several days of work (mostly because GCC internals are quite complex in the details).
Feel free to ask me more about MELT.

link c function in nasm

got a nasm project and i'm calling a c function from it
I put the name of the function in "extern"
and when linking i put all the links together but i can an error of "undefined reference to"
here is my compile/link command
gcc -o Project4 Project4.o array1c.c readdouble.o writedouble.o readarray.o printarray.o addarray.o invertarray.o invertarray2.o invertarray3.o averagearray.o quicksort.c
I would first compile all of your .c files using the "gcc -c" command into object files, then link those resulting .o files (such as "array1c.o" and "quicksort.o") together with your other pre-existing object files and see if that still gives you an undefined reference. That may be an unnecessary step, but I've never combined raw .c files and .o files in a single call to gcc.
You may also have to add an underscore to the beginning of any c-functions called ... I know this an be a platform dependent thing (i.e., Linux typically doesn't need underscores on c-functions whereas OSX and some other UNIX platforms do).
Lastly you could try, using ld, to just link all the object files together at once rather than linking some of the object files together into Project4.o, and then linking that to what you had assembled using nasm (at least that's what I'm assuming you're doing, i.e., you're making a Project4.o, and then calling functions from that in your assembly code).
Hope this helps,
Jason

Resources