Difference between using objcopy and xxding the file into a c source - c

Say I want to embed a file called data in my C executable.
The result which comes up from google is this linuxjournal page which says use objdump like this
objcopy --input binary \
--output elf32-i386 \
--binary-architecture i386 data data.o
However this is dependent on the architecture of the computer, for example when compiling the object from the previous command it gives i386 architecture of input file 'data.o' is incompatible with i386:x86-64 output and I have to change the arguments.
However with the unix tool xxd, I can simply make a c source code with the data in a unsigned char array and an integer with its length and obtain the same result with device independent compilation commands.
data.o: data.c
gcc -c data.c
data.c: data
xxd -i data > data.c
What is the preferred method and why?

The xxd is not a standard UNIX tool. It is actually part of VIM and is used for implementing its hex editor function. VIM is an optional tool and is not universally available.
The GNU objcopy, on the other hand, is part of GNU binutils and generally is preinstalled on all GNU systems.
In general, when one needs to include a binary file into a program, something simple (as you do with xxd) is preferred over the objcopy. Mainly, for the simple reason that objcopy is heavily under-documented and leaves impression of being an unpolished front-end to the BFD, the underlying library of the binutils. Another reason is that along with the .c file, you can also create the .h file, and make the generated files an integral part of your project.
The article you link already contains a number of examples how to accomplish that. Probably the most popular tool for the purpose is the hexdump, preinstalled on literally all systems. For example, from the top of my head:
# .c
echo 'char data[] = {' > data.c
hexdump -v -e '1/1 "0x%02X,"' < data.bin >> data.c
echo >> data.c
echo '};' >> data.c
echo 'size_t data_size = sizeof(data);'
# .h
echo 'extern char data[];' > data.h
echo 'extern size_t data_size;' >> data.h

Related

Where are avr-gcc libraries stored?

I'm trying to locate the .c files that are related to the #include header files in avr.
I want to have a look at some of the standard libraries that are defined in the avr-gcc library, particularly the PORT definitions contained in <avr/io.h>. I searched through the library in /usr/lib/avr/include/avr and found the header file, however what I am looking for is the .c file. Does this file exist? If so, where can I find it? If not, what is the header file referencing?
The compiler provided libraries are precompiled object code stored in static libraries. In gcc, libraries conventionally the extension .a (for "archive" for largely historic reasons), and the prefix "lib".
At build time, the linker will search the library archives to find the object-code modules necessary to resolve referenced to library symbols. It extracts the required modules and links them to the binary image being built.
In gcc a library libXXX.a is typically linked using the command line switch -lXXX - so the libXXX.a naming convention is important in that case. So for example the standard C library libc.a is looking linked by the switch -lc.
So to answer your question, there are normally no .c files for the compiler provided libraries provided with the toolchain. The library need not even have been written by in C.
That said, being open source, the source files (.c or otherwise) will be available from the repositories of the various libraries. For example, for the standard C library: https://www.nongnu.org/avr-libc/.
For other AVR architecture and I/O support libraries, you might inspect the associated header files or documentation. The header files will typically have a boiler-plate comment with a project URL for example.
PORTB and other special function registers are usually defined as macros in headers provided by avr-libc. Find your include/avr directory (the one that contains io.h). In that directory, there should be many other header files. As an example, iom328p.h contains the following line that defines PORTB on the ATmega328P:
#define PORTB _SFR_IO8(0x05)
If you are also looking for the libraries that are distributed as .a files, you should run avr-gcc -print-search-dirs.
There are several ways to find out where the system headers are located and which are included:
avr-gcc -v -mmcu=atmega8 foo.c ...
With option -v, GCC will print (amongst other stuff) whch include paths it is using. Check the output on a shell / console, where GCC will print the search paths:
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/gcc/avr/5.4.0/include
/usr/lib/gcc/avr/5.4.0/include-fixed
/usr/lib/gcc/avr/5.4.0/../../../avr/include
The last location is for AVR-LibC, which provides avr/io.h. Resolving the ..s, that path is just /usr/lib/avr/include. These paths depend on how avr-gcc was configured and installed, hence you have to run that command with your installation of avr-gcc.
avr-gcc -H -mmcu=atmega8 foo.c ...
Suppose the C-file foo.c reads:
#include <avr/io.h>
int main (void)
{
PORTD = 0;
}
for an easy example. With -H, GCC will print out which files it is actually including:
. /usr/lib/avr/include/avr/io.h
.. /usr/lib/avr/include/avr/sfr_defs.h
... /usr/lib/avr/include/inttypes.h
.... /usr/lib/gcc/avr/5.4.0/include/stdint.h
..... /usr/lib/avr/include/stdint.h
.. /usr/lib/avr/include/avr/iom8.h
.. /usr/lib/avr/include/avr/portpins.h
.. /usr/lib/avr/include/avr/common.h
.. /usr/lib/avr/include/avr/version.h
.. /usr/lib/avr/include/avr/fuse.h
.. /usr/lib/avr/include/avr/lock.h
avr-gcc -save-temps -g3 -mmcu=atmega8 foo.c ...
With DWARF-3 debugging info, the macro definitions will be recorded in the debug info and are visible in the pre-processed file (*.i for C code, *.ii for C++, *.s for pre-processed assembly). Hence, in foo.i we can find the definition of PORTD as
#define PORTD _SFR_IO8(0x12)
Starting from the line which contains that definition, scroll up until you find the annotation that tells in which file the macro definition happened. For example
# 45 "/usr/lib/avr/include/avr/iom8.h" 3
in the case of my toolchain installation. This means that the lines following that annotation follow line 45 of /usr/lib/avr/include/avr/iom8.h.
If you want to see the resolution of PORTD, scroll down to the end of foo.i which contains the pre-processed source:
# 3 "foo.c"
int main (void)
{
(*(volatile uint8_t *)((0x12) + 0x20)) = 0;
}
0x12 is the I/O address of PORTD, and 0x20 is the offset between I/O addresses and RAM addresses for ATmega8. This means the compiler may implement PORTD = 0 by means of out 0x12, __zero_reg__.
avr-gcc -print-file-name=libc.a -mmcu=...
Finally, this command will print the location (absolue path) of libraries like libc.a, libm.a, libgcc.a or lib<mcu>.a. The location of the library depends on how the compiler was configureed and installed, but also on command line options like -mmcu=.
avr-gcc -Wl,-Map,foo.map -mmcu=atmega8 foo.c -o foo.elf
This directs the linker to dump a "map" file foo.map where it reports which symbol will drag which module from which library. This is a text file that contains lines like:
LOAD /usr/lib/gcc/avr/5.4.0/../../../avr/lib/avr4/crtatmega8.o
...
LOAD /usr/lib/gcc/avr/5.4.0/avr4/libgcc.a
LOAD /usr/lib/gcc/avr/5.4.0/../../../avr/lib/avr4/libm.a
LOAD /usr/lib/gcc/avr/5.4.0/../../../avr/lib/avr4/libc.a
LOAD /usr/lib/gcc/avr/5.4.0/../../../avr/lib/avr4/libatmega8.a
libgcc.a is from the compiler's C runtime, and all the others are provided by AVR-LibC. Resolving the ..s, the AVR-LibC files for ATmega8 are located in /usr/lib/avr/lib/avr4/.

UNIX: C: cc compiler. Command Line: How to display the binary file header on screen

I am looking for the unix command to display the header portion in Hex for any excutable that has been compiled by the cc compiler.
I had this once and now I cant remember it.
I just want to see what the compiler code that is at the start of any c programs that I compile
I am aware that I can use 'hexdump [filename]' however that doesnt isolate the header portion .
Hope i have explained myself well enough.....
The command readelf is available on most Linux systems and has the ability to display many parts of an ELF file. You can use readelf -H to get a short synopsis of the various options.
To get just the file header you can use readelf -h or readelf --fileheader to display the file header.
To see it in hex, you can use the command xxd. Given that the elf header is 64 bytes (on a 64-bit machine), you can use xxd -l 64
Objdump command in Linux is used to provide thorough information on object files. This command is mainly used by the programmers who work on compilers, but still its a very handy tool for normal programmers also when it comes to debugging. In this article, we will understand how to use objdump command through some examples.
Basic syntax of objdump is :
objdump [options] objfile...
There is a wide range of options available for this command.
For example, factorial is the c program that I have to compiled.
1.Display object format specific file header contents using
-p option
The following example prints the object file format specific information.
$ objdump -p factorial
Display the contents of all headers using -x option
Information related to all the headers in the object file can be retrieved using the -x option.
objdump -x factorial

binutils - kernel - "_binary" meaning?

I am reading xv6 lectures.
I have a file named initcode.S that is to be linked in the kernel.
Now two symbols are created that way :
extern char _binary_initcode_start[], _binary_initcode_size[];
inside a function.
The lecture says :
as part of the kernel build process, the linker embeds that binary that defines two special symbols, _binary_initcode_starcode_size, indicating the location and size of the binary.
I understand that binutils is getting the address and the size of this assembled code.
I wonder about the notation : is it default ? my searches didn't prove that clearly.
_binary -> it is originally an assembly code
_initcode -> the name of my file
_start -> the parameter i am interested in.
It would imply that any assembly code compiled would have those variables too.
I have no proof of that, though.
The question is :
is _binary_myAsmFileHere_myParameterhere the default variable structure binutils give to the assembly file to export their address, size and so on ?
Could someone tell me if my assumption is right and if it is better than that : the rule
Thanks
Strangely enough, it doesn't seem to be documented in the ld manual. However, man objcopy does say this:
You can access this binary data inside a program by referencing the
special symbols that are created by the conversion process. These
symbols are called _binary_objfile_start, _binary_objfile_end and
_binary_objfile_size. e.g. you can transform a picture file into an object file and then access it in your code using these symbols.
Apparently the same logic is used by ld when embedding binary files.
Notice that the Makefile for xv6 contains this line for linking the kernel:
$(LD) $(LDFLAGS) -T kernel.ld -o kernel entry.o $(OBJS) -b binary initcode entryother
As you can see, it uses -b binary to embed the files initcode and entryother, so the above symbols will be defined during this process.
when a .global variable is defined in an assembly file, for a C file to be able to reference that variable, the C file has to prepend a '_' to the variable name. This is so the linker can 'link' the name in the C file with the name in the assembly file.

Storing CRC into an AXF/ELF file

I'm currently working on a C program in the LPCXpresso (eclipse-based) tool-chain on Windows 7, an IDE with gcc targeting the an NXP Cortex M3 microprocessor. It provides a simple way to compile-link-program the microprocessor over JTAG. The result of a build is an AXF file (ELF format) that is loaded by a debug configuration.
The loaded program resides in Flash memory from 0x00000 to 0x3FFFB. I'd like to include a 4-byte CRC-32 at 0x3FFFC to validate the program at start-up. I added another section and use the gcc __attribute__ directive to access that memory location.
uint32_t crc32_build __attribute__ ((section(".text_MFlashCRC")));
To compute and store the CRC-32 value, my plan was to use SRecord with the following post-build steps:
arm-none-eabi-size "${BuildArtifactFileName}"
arm-none-eabi-objcopy -O binary "${BuildArtifactFileName}" "${BuildArtifactFileBaseName}.bin"
checksum -p ${TargetChip} -d "${BuildArtifactFileBaseName}.bin"
../util/srec_cat "${BuildArtifactFileBaseName}.bin" -binary -crop 0 0x3FFFC -fill 0xFF 0x00000 0x3FFFC -crc32-b-e 0x3FFFC -o "${BuildArtifactFileBaseName}.crc.bin" -binary
echo ""
echo "CRC32:"
../util/srec_cat "${BuildArtifactFileBaseName}.crc.bin" -binary -crop 0x3FFFC 0x40000 -o - -hex-dump
This creates a binary with a checksum (necessary for bootloader) and then computes the CRC over the used Flash memory, storing the CRC value at 0x3FFFC.
However, I don't think I can load the binary file using the debugger. There is a built in programming utility with LPCXpresso that can load the modified binary file, however, that doesn't let me debug. I believe I can then try to start a debugging session with the original AXF file using "attach-only" mode, however, this becomes cumbersome.
I've been able to use readelf to inspect the crc32_build variable in the AXF file. Is there a way to edit the variable in the AXF file? Is there an industry-standard approach to inserting a CRC as a post-build step?
There is no industry standard that I am aware of. There are various techniques to do this. I would suggest that you use the crc32_build as an extern in 'C' and define it via a linker script. For instance,
$ cat ld.script
.text : {
_start_crc_region = .;
*(.text);
_end_crc_region = .;
crc32_build = .;
LONG(CALC_CRC);
}
You pass the value CALC_CRC as zero for a first invocation and then relink with the value set. For instance,
$ ld --defsym=CALC_CRC=0 -T ld.script *.o -o phony.elf
$ objcopy -j sections phony.elf -o phony.bin # sections means checksum 'areas'
$ ld --defsym=CALC_CRC=`crc32 phony.bin` -T ld.script *.o -o target.elf
I use this technique to add digital signing to images; it should apply equally well to crc values. The linker script allows you to position the variable, which is often important for integrity checks like a CRC, but wouldn't matter for a simple checksum. A linker script also allows you to define symbols for both the start and end of the region. Without a script, you need some elf introspection.
You can of course extend the idea to include init data and other allocated sections. At some point you need to use objcopy to extract the sections and do the integrity check at build time. The sections may have various alignment constraints and you need to mimic this (in phony.bin above) on the host when doing the build time crc calculation.
As a bonus, everything is already done when you generate an srec file.
If you have trouble with --defsym, you can just pre-process the ld.script with sed, awk, perl, python, etc and substitute text with a hex value where CALC_CRC is.

Which program creates a C array given any file?

I remember seeing in the past a program that would take any file and generate a C array representing that file as output; it would prevent distribution of a separate file in some cases. Which Unix/Linux program does that?
xxd -i
For large files, converting to text and then making the compiler parse it all over again is inefficient and unnecessary. Use objcopy instead:
objcopy -I binary -O elf32-i386 stuff stuff.o
(Adjust the output architecture as necessary for non-x86 platforms.) Then once you link it into your program, you can access it like so:
extern char _binary_stuff_start[], _binary_stuff_end[];
#define SIZE_OF_STUFF (_binary_stuff_end - _binary_stuff_start)
...
foo(_binary_stuff_start[i]);
hexdump -v -e '16/1 "0x%x," "\n"'
would generate a C like array from stdin, but there is no declaration, no braces or good formatting.
I know this is Unix/Linux question, but anyone viewing this that wants to do the same in Windows can use Bin2H.
The easiest way is:
xxd -i -a filename

Resources