Memory utilization for unwind support (on ARM architecture) - c

I am currently working trying to develop software for a SAM7X256 microcontroller in C. The device is running contiki OS and I am using the yagarto toolchain.
While studying the map file (to try to figure out why the .text region had grown so much) I discovered that several kb of the .text region where assigned to unwind support (see below)
.text 0x00116824 0xee4 c:/toolchains/yagarto/bin/../lib/gcc/arm-none-eabi/4.6.2\libgcc.a(unwind-arm.o)
0x00116c4c _Unwind_VRS_Get
......
0x0011763c __gnu_Unwind_Backtrace
.text 0x00117708 0x1b0 c:/toolchains/yagarto/bin/../lib/gcc/arm-none-eabi/4.6.2\libgcc.a(libunwind.o)
0x00117708 __restore_core_regs
0x00117708 restore_core_regs
....
0x00117894 _Unwind_Backtrace
.text 0x001178b8 0x558 c:/toolchains/yagarto/bin/../lib/gcc/arm-none-eabi/4.6.2\libgcc.a(pr-support.o)
0x00117958 __gnu_unwind_execute
...
0x00117e08 _Unwind_GetTextRelBase
I have tried finding looking for some information on unwinding and found 1 and 2. However the following is still unclear to me:
When/why do I need unwinding support?
What part of my code is causing pr-support.o, unwind-arm.o and libunwind.o to be linked?
If applicable, how do I avoid linking the items below.
In case it is necessary I am including a link to the complete map file
Thanks in advance for your help
Edit 1:
Adding Linker commands
CC = arm-none-eabi-gcc
CFLAGSNO = -I. -I$(CONTIKI)/core -I$(CONTIKI_CPU) -I$(CONTIKI_CPU)/loader \
-I$(CONTIKI_CPU)/dbg-io \
-I$(CONTIKI)/platform/$(TARGET) \
${addprefix -I,$(APPDIRS)} \
-DWITH_UIP -DWITH_ASCII -DMCK=$(MCK) \
-Wall $(ARCH_FLAGS) -g -D SUBTARGET=$(SUBTARGET)
CFLAGS += $(CFLAGSNO) -O -DRUN_AS_SYSTEM -DROM_RUN -ffunction-sections
LDFLAGS += -L $(CONTIKI_CPU) --verbose -T $(LINKERSCRIPT) -nostartfiles -Wl,-Map,$(TARGET).map
$(CC) $(LDFLAGS) $(CFLAGS) -nostartfiles -o project.elf -lc Project.a

Several parts to this answer:
the unwinding library functions are pulled in from exception "personality routines" (__aeabi_unwind_cpp_pr0 etc.) that are mentioned in exception tables in some of the GCC library function modules.
your map file shows that bpapi.o (a module which contains integer division functions) pulls in this exception code. I don't see this in the latest YAGARTO, but I do it in _divdi3.o which is another integer division helper module. I can reproduce the effect of the unwinding code being pulled in by writing a trivial main() that does a 64-bit division.
the general reason for C code having (non-trivial) exception tables is so that C++ exceptions can be thrown "through" the C code when you arbitrarily mix C and C++ code in your application.
functions which can't throw or call throwing functions, should, if they have exception tables at all, only need trivial ones marked as CANTUNWIND, so that the unwinding library isn't pulled in. You'd expect division helpers to be in this category and in fact in CodeSourcery's distribution, _divdi3.o is marked CANTUNWIND.
so the root cause is that YAGARTO's GCC library (libgcc.a) is built inappropriately. Not quite incorrectly, as it should still work, but it's code bloat that you wouldn't expect in an embedded toolchain.
Can you do anything about this? There seems to be no simple way to get the GNU linker to ignore ARM exception sections, even with a /DISCARD/ script - the link to the text section overrides that. But what you can do is add a stub definition for the exception personality routine:
void __aeabi_unwind_cpp_pr0(void) {}
int main(void) { return *(unsigned long long *)0x1000 / 3; }
compiles to 4K using YAGARTO, compared to 14K without the stub. But you might want to investigate alternative GNU tools distributions too.

GCC has an option that eliminates exception handling.
-fno-exceptions
While I'm not familiar with yagarto to say for sure, it may have a similar option. On GCC, this option eliminates this overhead at the expense of support for standard exceptions.

Related

How to implement Erlang Driver As Default efficient Implementation

Erlang Run-Time System (ERTS) have a few drivers written in C language that used to interact with the OS or to access low-level resources, In my knowledge the ERTS compile these drivers at boot time to get ready for loading from Erlang code, the driver inet_drv.c is one of these drivers and it's used to handle networking tasks like creating sockets and listening or accepting new incoming connections.
I wanted to test this driver manually to get a general view of the default behaviour of the ERTS and to know how to implement drivers efficiently in the future, I tracked the Erlang Manual Reference to implement drivers that said: first write and compile the driver by an OS C Language Compiler, second load the driver from erlang code using erl_ddll module, finally link to the driver by a spawned Erlang process, so this is very simple and easy.
So I tried these steps with the driver inet_drv.c, I searched for it and tried to compile it with Clang Compiler which is the Default C Compiler of FreeBSD System :
cc inet_drv.c
after that there was an error saying that the file erl_driver.h is not defined, this header file is used in the driver's code as an included file (#include<erl_driver.h>) so I searched for it and add it's directory path to the cc command using the -I option to get the compiler search for the included file in this directory and I recompile it :
cc inet_drv.c -I/usr/ports....
after that, there was be another undefined file so I did the same thing for 5 or 6 times and finally, I add all needed paths for included files and the result is this command :
cc inet_drv.c
-I/usr/ports/lang/erlang/work/otp-OTP-21.3.8.18/erts/emulator/beam
-I/usr/local/lib/erlang/usr/include
-I/usr/ports/lang/erlang/work/otp-OTP-21.3.8.18/erts/emulator/sys/unix
-I/usr/ports/lang/erlang/work/otp-OTP-21.3.8.18/erts/include/internal
-I/usr/ports/lang/erlang/work/otp-OTP-21.3.8.18/erts/emulator/sys/common
-I/usr/ports/lang/erlang/work/stage/usr/local/lib/erlang/erts-10.3.5.14/include/internal
I was surprised by the result:13 errors and 7 warnings, the shell output and errors and warnings description are in the links below.
My question is : why these errors occurs ? What is the wrong in what I did ?
Since this driver works perfectly in response to the ERTS networking tasks, then it's compiled by the ERTS without errors and the ERTS should use an OS C Language Compiler which is Clang by default and should add included headers files as I did, so why this did not work when I tried to do ?
https://ibb.co/bbtFHZ7
https://ibb.co/sF8QsDx
https://ibb.co/Lh9cDCH
https://ibb.co/W5Gcj7g
First things first:
In my knowledge the ERTS compile these drivers at boot time
No, ERTS doesn't compile the drivers. inet_drv.c is compiled as part of Erlang/OTP and linked into the beam.smp binary.
inet_drv is not a typical driver. Quoting the How to Implement a Driver section of the documentation:
A driver can be dynamically loaded, as a shared library (known as a DLL on Windows), or statically loaded, linked with the emulator when it is compiled and linked. Only dynamically loaded drivers are described here, statically linked drivers are beyond the scope of this section.
inet_drv is a statically loaded driver, and as such doesn't need to be loaded with erl_ddll.
On to the compilation errors. All the compiler parameters are automatically added for you when you run make, so if you need to call the compiler manually, better just check the command line that make generated and start from that. Let's look at the build log for the Debian Erlang package. Searching for inet_drv we get this command line (line breaks added):
x86_64-linux-gnu-gcc -Werror=undef -Werror=implicit -Werror=return-type -fno-common \
-g -O2 -fno-strict-aliasing -I/<<PKGBUILDDIR>>/erts/x86_64-pc-linux-gnu -D_GNU_SOURCE \
-DHAVE_CONFIG_H -Wall -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes \
-Wdeclaration-after-statement -DUSE_THREADS -D_THREAD_SAFE -D_REENTRANT -DPOSIX_THREADS \
-D_POSIX_THREAD_SAFE_FUNCTIONS -DBEAMASM=1 -DLIBSCTP=libsctp.so.1 \
-Ix86_64-pc-linux-gnu/opt/jit -Ibeam -Isys/unix -Isys/common -Ix86_64-pc-linux-gnu \
-Ipcre -I../include -I../include/x86_64-pc-linux-gnu -I../include/internal \
-I../include/internal/x86_64-pc-linux-gnu -Ibeam/jit -Ibeam/jit/x86 -Idrivers/common \
-Idrivers/unix -c \
drivers/common/inet_drv.c -o obj/x86_64-pc-linux-gnu/opt/jit/inet_drv.o
Some of it will be different since you're building on FreeBSD, but the principle stands - most of the time you'll want to just run make instead of invoking the compiler directly, but if you need to invoke the compiler, it will be much easier to start with the command line that make generated for you.

Configuring ceedling with mqueue.h and -lrt

I'm writing unit tests for a project in C using Throw The Switch's Ceedling/Unity/CMock combo as the unit testing framework.
I've run into an interesting dilemma where I'm using mqueue.h in one of my unit tests. When the tests compile, I get gcc linker undefined reference errors for mq_open(), mq_close(), etc..
From what I understand, based on this finding the -lrt flag needs to go at the end of the gcc command--after listing sources (and executables?) gcc test_foo.c -lrt. Unfortunately, ceedling is written to put the flag right after the command: gcc -lrt test_foo.c, and I can't find a way to change the order.
The documentation supplied with Ceedling only covers how to add flags to the gcc command, not how to change the order. I've tried poking around in Ceedling's vast source code, but it's written in Ruby, which I'm unfamiliar with.
So my questions are:
Does the placement of -lrt really effect the linking of mq_*()
Any thoughts on how to change the placement of the -lrt flag?
Almost 3 years later had similar issue. They have added this feature in https://github.com/ThrowTheSwitch/Ceedling/issues/136, but usage is still not easy to understand from documentation. I needed to include math library (requires '-lm' flag in the end of command) and ended up with following config section (particularly system part):
:libraries:
:placement: :end
:flag: "${1} " # or "-L ${1}" for example
:common: &common_libraries []
:system:
- -lm
:test:
- *common_libraries
:release:
- *common_libraries
For some reason ceedling did not add flags at all, when added to commons or particular build sections.

ESP8266: What can I do to overcome "section `.text' will not fit in region `iram1_0_seg'"?

What are general measures against the .text region not fitting into "iram1_0_seg" when linking for the ESP8266 using the xtensa GCC based toolchain?
I guess that the ESP8266s RAM is not big enough to hold certain functions. However, what can I do to move as many functions into flash as possible?
Here is an example of what the linker returns:
/home/user/.arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/bin/xtensa-lx106-elf-gcc -I/home/user/git/esp-open-sdk/sdk/include -I/home/user/git/esp-open-sdk/sdk/include/json -I/home/user/git/mart3/src/RTMain/ESP8266TargetGroup -Os -D__ESP8266__ -std=c99 -pedantic -Wall -Wpointer-arith -pipe -Wno-unused-parameter -Wno-unused-variable -Os -g -O2 -Wpointer-arith -Wundef -Wl,-EL -fno-inline-functions -nostdlib -mlongcalls -mtext-section-literals -D__ets__ -DICACHE_FLASH -ffunction-sections -fdata-sections -L/home/user/.arduino15/packages/esp8266/hardware/esp8266/2.0.0/tools/sdk/lib -L/home/user/.arduino15/packages/esp8266/hardware/esp8266/2.0.0/tools/sdk/ld -Teagle.flash.512k0.ld -nostdlib -Wl,--no-check-sections -u call_user_start -Wl,-static -Wl,--gc-sections src/code/CMakeFiles/FX6CodeObj.dir/FX6Generated/src-gen/fxfu___program1.c.obj src/code/CMakeFiles/FX6CodeObj.dir/FX6Generated/src/emptyHello/fxfu___helloart.c.obj src/code/CMakeFiles/FX6CodeObj.dir/FXStd/FXRTMain.c.obj src/code/CMakeFiles/FX6CodeObj.dir/FXStd/NamedList.c.obj -o src/ARTApp/ARTApp.out -Wl,--start-group src/ART/libART.a -lm -lgcc -lhal -lphy -lnet80211 -llwip -lwpa -lmain -lpp -lsmartconfig -lwps -lcrypto -laxtls -Wl,--end-group
/home/user/.arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/bin/../lib/gcc/xtensa-lx106-elf/4.8.2/../../../../xtensa-lx106-elf/bin/ld: src/ARTApp/ARTApp.out section `.text' will not fit in region `iram1_0_seg'
collect2: error: ld returned 1 exit status
I don't know about Arduino but if you were to program using the Espressif libraries found here https://github.com/esp8266/esp8266-wiki/raw/master/sdk/ then they have a lot of Macros for things such as that.
As an example the "main" of the ESP is put into flash using the following line.
void ICACHE_FLASH_ATTR user_init(){
If you trace the ICACHE... command you will find this define
#define ICACHE_FLASH_ATTR __attribute__((section(".irom0.text")))
If you then look through how espressif sets up the memory sections https://github.com/esp8266/esp8266-wiki/wiki/Memory-Map .irom0.text is labelled as the flash memory. Basically anything with the ICACHE... command is loaded into flash memory anything without is not.
Again not sure how to translate this to Arduino code but it might be time to move away from the Arduino libraries if you are running out of flash space. You didn't specify which ESP breakout you are using and my mind may be playing tricks on me but I believe the ESP12-e uses a newer chip which has more flash memory then say the ESP01, just another option.
It is a little late to answer the question but I thought others may be interested in a possible solution. This is what I have done to figure out about the iram1_0_seg overflow error on ESP8266 nonos sdk based firmware.
In order to find out what functions are actually allocated in the iram1_0_seg section run the following command:
$ xtensa-lx106-elf-nm -av yourprogram.elf | uniq -u | grep "^4010*"
The 'yourprogram.elf' of course needs to be replaced with the name of your firmware elf file. All iram1_0_seg functions are in the range of 4010xxxx addresses, hence the grep 4010. Obviously, this command can only be executed when the elf has been generated. If the elf file can not be generated due to the iram1_0_seg overflow error, then it is necessary to remove some code. Or roll back to a version of your code that was still fitting and did not have the iram1_0_seg overflow error.
The output of the above nm command will end with a line such as:
$ 4010680c A _text_end
The iram1_0_seg is limited to 0x8000 bytes on the ESP8266. In the above example, I have '0x680c' bytes allocated and therefore there is enough room. The nm command will list all functions allocated in the iram1_0_seg segment. Please look which of the functions are maybe not needed to be allocated in RAM. Most functions can run out of FLASH, but if you don't mark them with ICACHE_FLASH_ATTR, then the functions end up in RAM (and use up the 0x8000 bytes).
In case you see functions that should be in FLASH (but aren't), that actually come from the standard libraries libc.a or libgcc.a, then you have a good chance to make room in the iram1_0_seg segment. Let's take the example of memcpy, which is available in rom already and should not be needed at all in any library. The reason why it is anyway pulled from libc.a is due to the precedence that libraries take over PROVIDE statements in the linker script. Take a look at your eagle.rom.addr.v6.ld file (in your /esp-open-sdk/sdk/ld folder). In there you see the PROVIDE(memcpy=..) statement.
This tells you that memcpy is available in ROM. So, how to use the ROM function instead of the libc.a version? The easiest way is to simply remove memcpy from libc.a so that the linker takes the rom function. To be safe, I'm suggesting to make a copy of libc.a before, just in case something goes wrong. The following commands need to be executed in the library folder (where libc.a is located) in order to remove memcpy:
$ cp libc.a libc2.a
$ ar d libc2.a lib_a-memcpy.o
After you have changed your Makefile to link with -lc2 instead of -lc, the 'memcpy' function will be taken from ROM. Check with the above nm command to see that it was successful. memcpy should be no more listed in the 401 list. And maybe repeat with other libc.a functions (e,g, 'memcmp', 'strlen', 'strcpy', 'strcmp', ...).
This is how I brought the iram1_0_seg usage down to 0x680c bytes.
Similar procedure can be done with libgcc.a functions: __muldf3, __mulsf3, __umulsidi3, ...
You can try to change the memory allocation scheme by choosing another option in the Tools > MMU section. For example choose '16KB cache + 48KB IRAM (IRAM)' instead of '32KB cache + 32KB IRAM (balanced)'.

How to create 4KB Linux binaries that render a 3D scene?

I just learned about the 4k demo scene contest. It consists in creating a 4KB executable which renders a nice 3D scene. The cited demo was build for Windows, so I was wondering, how one could create 4KB OpenGL scenes on Linux.
A bare "hello world" already consumes 8KB:
$ cat ex.c
#include <stdio.h>
int main()
{
printf("Hello world\n");
}
$ gcc -Os ex.c -o ex
$ ls -l ex
-rwxrwxr-x 1 cklein cklein 8374 2012-05-11 13:56 ex
The main reason why with the standard settings you can't make a small tool is that a lot of symbols and references to standard libraries are pulled into your binary. You must be explicit to to remove even that basic stuff.
Here's how I did it:
http://phresnel.org/gpl/4k/ntropy2k7/
Relevant Options:
Mostly self-explaining:
gcc main.c -o fourk0001 -Os -mfpmath=387 \
-mfancy-math-387 -fmerge-all-constants -fsingle-precision-constant \
-fno-math-errno -Wall -ldl -ffast-math -nostartfiles -nostdlib \
-fno-unroll-loops -fshort-double
Massage:
strip helps you get rid of unneeded symbols embedded in your binary:
strip -R .note -R .comment -R .eh_frame -R .eh_frame_hdr -s fourk0001
Code:
You may have to tweak and trial and error a lot. Sometimes, a loop gives smaller code, sometimes a call, sometimes a force inlined function. In my code, e.g., instead of having a clean linked list that contains all flame transforms in fancy polymorphic style, I have a fixed array where each element is a big entity containing all parameters, used or unused, as a union of all flames as per Scott Draves flame paper.
Your tricks won't be portable, other versions of g++ might give suboptimal results.
Note that with above parameters, you do not write a main() function, but rather a _start() function.
Also note that using libraries is a bit different. Instead of linking SDL and standard library functions the classy, convenient way, you must do it manually. E.g.
void *libSDL = dlopen( "libSDL.so", RTLD_LAZY );
void *libC = dlopen( "libc.so", RTLD_LAZY );
#if 1
SDL_SetVideoMode_t sym_SDL_SetVideoMode = dlsym(libSDL, "SDL_SetVideoMode");
g_sdlbuff = sym_SDL_SetVideoMode(WIDTH,HEIGHT,32,SDL_HWSURFACE|SDL_DOUBLEBUF);
#else
((SDL_SetVideoMode_t)dlsym(libSDL, "SDL_SetVideoMode"))(WIDTH,HEIGHT,32,SDL_HWSURFACE|SDL_DOUBLEBUF);
#endif
//> need malloc, probably kinda craft (we only use it once :| )
//> load some sdl cruft (cruft!)
malloc_t sym_malloc = dlsym( libC, "malloc" );
sym_rand = dlsym( libC, "rand" );
sym_srand = dlsym( libC, "srand" );
sym_SDL_Flip = dlsym(libSDL, "SDL_Flip");
sym_SDL_LockSurface = dlsym(libSDL, "SDL_LockSurface");
sym_SDL_UnlockSurface = dlsym(libSDL, "SDL_UnlockSurface");
sym_SDL_MapRGB = dlsym(libSDL, "SDL_MapRGB");
And even though no assembler has to be harmed, your code might yield UB.
edit:
Oops, I lied about assembly.
void _start() {
...
asm( "int $0x80" :: "a"(1), "b"(42) );
}
this will make your program return 42.
A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux is an interesting article that goes through a step-by-step process to create an ELF executable as small as possible.
I don't want to spoil the ending, but the author gets it down to a lot smaller than 4K ;)
Take a look at this article in KSplice blog from a while back. It talks about linking without the standard libraries.
https://blogs.oracle.com/ksplice/entry/hello_from_a_libc_free

how do I always include symbols from a static library?

Suppose I have a static library libx.a. How to I make some symbols (not all) from this library to be always present in any binary I link with my library? Reason is that I need these symbols to be available via dlopen+dlsym. I'm aware of --whole-archive linker switch, but it forces all object files from library archive to linked into resulting binary, and that is not what I want...
Observations so far (CentOS 5.4, 32bit) (upd: this paragraph is wrong; I could not reproduce this behaviour)
ld main.o libx.a
will happily strip all non-referenced symbols, while
ld main.o -L. -lx
will link whole library in. I guess this depends on version of binutils used, however, and newer linkers will be able to cherry-pick individual objects from a static library.
Another question is how can I achieve the same effect under Windows?
Thanks in advance. Any hints will be greatly appreciated.
Imagine you have a project which consists of the following three C files in the same folder;
// ---- jam.h
int jam_badger(int);
// ---- jam.c
#include "jam.h"
int jam_badger(int a)
{
return a + 1;
}
// ---- main.c
#include "jam.h"
int main()
{
return jam_badger(2);
}
And you build it with a boost-build bjam file like this;
lib jam : jam.c <link>static ;
lib jam_badger : jam ;
exe demo : jam_badger main.c ;
You will get an error like this.
undefined reference to `jam_badger'
(I have used bjam here because the file is easier to read, but you could use anything you want)
Removing the 'static' produces a working binary, as does adding static to the other library, or just using the one library (rather than the silly wrapping on inside the other)
The reason this happens is because ld is clever enough to only select the parts of the archive which are actually used, which in this case is none of them.
The solution is to surround the static archives with -Wl,--whole-archive and -Wl,--no-whole-archive, like so;
g++ -o "libjam_candle_badger.so" -Wl,--whole-archive libjam_badger.a Wl,--no-whole-archive
Not quite sure how to get boost-build to do this for you, but you get the idea.
First things first: ld main.o libx.a does not build a valid executable. In general, you should never use ld to link anything directly; always use proper compiler driver (gcc in this case) instead.
Also, "ld main.o libx.a" and "ld main.o -L. -lx" should be exactly equivalent. I am very doubtful you actually got different results from these two commands.
Now to answer your question: if you want foo, bar and baz to be exported from your a.out, do this:
gcc -Wl,-u,foo,-u,bar,-u,baz main.o -L. -lx -rdynamic
Update:
your statement: "symbols I want to include are used by library internally only" doesn't make much sense: if the symbols are internal to the library, why do you want to export them? And if something else uses them (via dlsym), then they are not internal to the library -- they are part of the library public API.
You should clarify your question and explain what you really are trying to achieve. Providing sample code will not hurt either.
I would start with splitting off those symbols you always need into a seperate library, retaining only the optional ones in libx.a.
Take an address of the symbol you need to include.
If gcc's optimiser anyway eliminates it, do something with this address - should be enough.

Resources