Intel hex format and position independent code using gcc - arm

I'm not sure if this is specific to the processor I'm using, so for what it's worth I'm using a Cortex M0+. I was wondering: if I generate a hex file through gcc using -fPIC, I produce...Position Independent Code. However, the intel hex file format that I get out of objcopy always has address information on each line's header. If I'm trying to write a bootloader, do I just ignore that information, skip the bytes relating to it, and load the actual code into memory wherever I want, or do I have to keep track of it somehow?

The intel-HEX format was specially designed to programm PROMs, EPROMS or processors with an internal EPROM and is normally used with programmers for theses devices. The addresses at the beginning of the records have not much to do with the program code directly. They indicate at which address of the PROM the data will be written. Remember also that the PROM can be mapped anywhere into the address space of the processor, thus the final address can change anyway.
As long as you don't want to program a PROM you must remove anything except the data from the records. (Don't forget the checksum at the end ;-)
As I understand the intel-HEX format the records must not be contiguous, there may be holes in between.
Some remarks:
The -f PIC parameter is not responsible for the intel-HEX format. I think that somewhere in your command lines you'll find -O ihex. If you want to have a file that could be executed, objcopy provides better suited output formats.
As long as you don't write earlier stages of the boot process by yourself, you don't load your bootloader - it will be loaded for you. The address at which this will happen is normally fixed and not changeable. So there is no need for position independent code, but it doesn't hurt either.

Related

Create a fixed size section with gcc and place values in it

I need to embed a binary file within an executable generated with gcc on Linux, to be executed in the host (not in a separated device).
In addition, I want to be able to change that binary content externally by using obcjcopy --update-section.
I could do that with __attribute__(("section")), but the problem is that the mentioned binary file might have different sizes at different moments, so I want to allocate a section of a fixed maximum size. Thus, I can update slightly bigger/smaller binaries in the future.
Apart from the above, I would like to give a default value to that particular section at build time (a predefined binary file that is available at build time).
This can be done with a linker script. However, as far as I understand, I would need to modify the OS default linker script, what I want to avoid.
The only thing that comes to my mind is to create an array on that section with a fixed size, using the first bytes for allocating the default binary file and padding the rest with 0xFF's for instance.
Is there a better way to do this?
As ikegami has mentioned, it's enough to specify the maximum size of the array and then initialise the values you need.

Generating load time serial number for PCB application

I am trying to generate an incrementing value at load time to be used to "serialize" a PCB with a unique code value. Not an expert in ld or preprocessor commands, so looking for some help.
The value will be used in a unique ID for each board that the code is loaded on and will also be used as a counter for boards in the field.
I have no preconceived idea of how I might accomplish this, so any workable answer to get me started, including a pre-preprocessor macro is fine. In my olden days, I recollect adding a couple lines to the linker file that would accomplish this, but I have been unable to resurrect that information anywhere (including my brain's memory cells).
The simpler the answer, the better.
My solution to the problem was remarkably simple.
The binary contained
const char *serial = "XY-00000";
I then wrote a short program that boiled down to:
char uniqueserial [8];
/* Generate serial - this was an SQL call to the manufacturing DB */
char *array;
/* Read binary into array */
memcpy(memmem(array, "XY-00000",8), uniqueserial,8);
/* Write array to temp bin file for flashing */
Depends on the serial template string being unique in the binary. Use strings command to check. I disable crc protected object files due to taste. I like my embedded binaries being exact memory dumps.
The linker is not the right place for two reasons:
the executable can be loaded with the same id in several devices, making your approach void.
You should have to link the executable for each device you are programming, which poses an spent of cpu resources.
The best place is to patch the executable at loading time with the serial number.
Select a data patern as token to initialize your variable with the device id (a pattern difficult to happen elsewhere in your program binary) and initialize your serial number variable to that data pattern (better if you do it statically initializing an array variable or something similar)
Make a program to be executed on each download to device that search for the pattern in the executable file, before loading the binary program into the device and writes the correct value to be programmed into the device (beware that you are patching a binary, so you cannot think on variable lenght strings or the like, that can trash all the work made by the linker)
Once patched the binary executable, you can download it to the device.
Another solution is to reserve a fixed area in your linker script for all this kind of information. Then, put all your device information variables there. Then get the exact positions in rom for the individual variables and include the proper data in the loaded image. In this case, the linker is your friend, reserving some fixed segment in your device's rom allocated for storing the device's individual data (you can put there mac addresses, serial numbers, default configuration, etc.)

writing hex file to RAM in ARM Cortex-M

I am doing an ongoing project to write a simplified OS for hobby/learning purposes. I can generate hex files, and now I want to write a script on the chip to accept them via serial, load them into RAM, then execute them. For simplicity I'm writing in assembly so that all of the startup code is up to me. Where do I start here? I know that the hex file format is well documented, but is it as simple as reading the headers for each line, aligning the addresses, then putting the data into RAM and jumping to the address? It sounds like I need a lot more than that, but this is a problem that most people don't even try to solve. Any help would be great.
way too vague, there are many different file formats and at least two really popular ones that use text with the data in hex. So not really helping us here.
writing a script on chip means you have an operating system running on your microcontroller? what operating system is it and what does the command line look like, etc.
assembly is not required to completely control everything (basically baremetal) can use asm to bootstrap C and then the rest in C, not a problem.
Are you wanting to download to ram and run or wanting to download and then burn to flash to reset into in some way?
Basically what you are making is a bootloader. And yes we write bootloaders all the time, one for each platform, sometimes borrowing code from a prior platform sometimes not.
First off on your development computer, windows, mac, linux, whatever, write a program (in C or Pascal ideally, something you can easily port to the microcontroller) that reads the whole file into an array, then write some code that basically accepts one byte at a time like you would if you were receiving it serially. Parse through that file format whatever format you choose (initially, then perhaps change formats if you decide you no longer like it) take real programs that you have built which the disassembler or other tools should have other output options to show you what bytes or words should be landing at what addresses. Parse this data, printf out the address/byte or address/word items you find, and then compare that to what the toolchain showed. carve the parsing tool out and replace the printf with write to memory at that address. and then yes you jump to the entry point if you can figure that out and/or you as the designer decide all programs must have a specific entry point.
Intel hex and motorola s-record are good choices (-O ihex or -O srec, my current favorite is --srec-forceS3 -O srec), both are documented at wikipedia. and at least the gnu tools will produce both, you can use a dumb terminal program (minicom) to dump the file into your microcontroller and hopefully parse and write to ram as it comes in. If you cant handle that flow you might think of a raw binary (-O binary) and implement an xmodem receiver in your bootloader.

Why does inserting characters into an executable binary file cause it to "break"?

Why does inserting characters into an executable binary file cause it to "break" ?
And, is there any way to add characters without breaking the compiled program?
Background
I've known for a long time that it is possible to use a hex editor to change code in a compiled executable file and still have it run as normal...
Example
As an example in the application below, Facebook could be changed to Lacebook, and the program will still execute just fine:
But it Breaks with new Characters
I'm also aware that if new characters are added, it will break the program and it won't run, or it will crash immediately. For example, adding My in front of Facebook would achieve this:
What I know
I've done some work with C and understand that code is written in human readable, compiled, and linked into an executable file.
I've done introductory studies of assembly language and understand the concepts about data, commands, and pointers being moved around
I've written small programs for Windows, Mac and Linux
What I don't know
I don't quite understand the relationship between the operating system and the executable file. I'd guess that when you type in the name of the program and press return you are basically instructing the operating system to "execute" that file, which basically means loading the file into memory, setting the processor's pointer to it, and telling it 'Go!'
I understand why having extra characters in a text string of the binary file would cause problems
What I'd like to know
Why do the extra characters cause the program to break?
What thing determines that the program is broken? The OS? Does the OS also keep this program sandboxed so that it doesn't crash the whole system nowadays?
Is there any way to add in extra characters to a text string of a compiled program via a hex editor and not have the application break?
I don't quite understand the relationship between the operating system and the executable file. I'd guess that when you type in the name of the program and press return you are basically instructing the operating system to "execute" that file, which basically means loading the file into memory, setting the processor's pointer to it, and telling it 'Go!'
Modern operating systems just map the file into memory. They don't bother loading pages of it until it's needed.
Why do the extra characters cause the program to break?
Because they put all the other information in the file in the wrong place, so the loader winds up loading the wrong things. Also, jumps in the code wind up being to the wrong place, perhaps in the middle of an instruction.
What thing determines that the program is broken? The OS? Does the OS also keep this program sandboxed so that it doesn't crash the whole system nowadays?
It depends on exactly what gets screwed up. It may be that you move a header and the loader notices that some parameters in the header have invalid data.
Is there any way to add in extra characters to a text string of a compiled program via a hex editor and not have the application break?
Probably not reliably. At a minimum, you'd need to reliably identify sections of code that need to be adjusted. That can be surprisingly difficult, particularly if someone has attempted to make it so deliberately.
When a program is compiled into machine code, it includes many references to the addresses of instructions and data in the program memory. The compiler determines the layout of all the memory of the program, and puts these addresses into the program. The executable file is also organized into sections, and there's a table of contents at the beginning that contains the number of bytes in each section.
If you insert something into the program, the address of everything after that is shifted up. But the parts of the program that contain references to the program and data locations are not updated, they continue to point to the original addresses. Also, the table that contains the sizes of all the sections is no longer correct, because you increased the size of whatever section you modified.
The format of a machine-language executable file is based on hard offsets, rather than on parsing a byte stream (like textual program source code). When you insert a byte somewhere, the file format continues to reference information which follows the insertion point at the original offsets.
Offsets may occur in the file format itself, such as the header which tells the loader where things are located in the file and how big they are.
Hard offsets also occur in machine language itself, such in instructions which refer to the program's data or in branch instructions.
Suppose an instruction says "branch 200 bytes down from where we are now", and you insert a byte into those 200 bytes (because a character string happens to be there that you want to alter). Oops; the branch still covers 200 bytes.
On some machines, the branch couldn't even be 201 bytes even if you fixed it up because it would be misaligned and cause a CPU exception; you would have to add, say, four bytes to patch it to 204 (along with a myriad other things needed to make the file sane).

Why does an EXE file that does *nothing* contain so many dummy zero bytes?

I've compiled a C file that does absolutely nothing (just a main that returns... not even a "Hello, world" gets printed), and I've compiled it with various compilers (MinGW GCC, Visual C++, Windows DDK, etc.). All of them link with the C runtime, which is standard.
But what I don't get is: When I open up the file in a hex editor (or a disassembler), why do I see that almost half of the 16 KB is just huge sections of either 0x00 bytes or 0xCC bytes? It seems rather ridiculous to me... is there any way to prevent these from occurring? And why are they there in the first place?
Thank you!
Executables in general contain a code segment and at least one data segment. I guess each of these has a standard minimum size, which may be 8K. And unused space is filled up with zeros. Note also that an EXE written in a higher level (than assembly) language contains some extra stuff on top of the direct translation of your own code and data:
startup and termination code (in C and its successors, this handles the input arguments, calls main(), then cleans up after exiting from main())
stub code and data (e.g. Windows executables contain a small DOS program stub whose only purpose is to display the message "This program is not executable under DOS").
Still, since executables are usually supposed to do something (i.e. their code and data segment(s) do contain useful stuff), and storage is cheap, by default noone optimizes for your case :-)
However, I believe most of the compilers have command line parameters with which you can force them to optimize for space - you may want to check the results with that setting.
Here is more details on the EXE file formats.
As it turns out, I should've been able to guess this beforehand... the answer was the debug symbols and code; those were taking up most of the space. Not compiling with /DEBUG and /PDB (which I always do by default) reduced the 13 K down to 3 K.

Resources