I'm building a project on Altera Nios II soft core; as I'm using the new Max 10 that has on-chip flash, I would like to partition code into slow performance code (that run into on-chip flash, .text) and high performance code (that run into on-chip RAM tightly coupled, .tight_instr).
Using the __attribute__((section(".tight_instr"))) directive I have accomplished the task of selecting functions to be linked into specific section, but, as such functions calls some libgcc modules, I would like to have also such modules into the same section.
I cannot link all libgcc modules into .tight_instr due to limited on-chip RAM, and having such modules into .text add a big penalty to the execution time.
What's the right way to script the SECTIONS linker?
Related
ARM cortex embedded application.
I have a bootloader that is 40KB, mostly crypto for signature of flash.
I have a 500KB flash.
The chip’s sectors are 128KB. And the protection scheme means the first sector must contain the bootloader. Meaning my 40KB bootloader will occupy all 128KB of this sector.
I would like to retake some of this wasted space by including my application crypto libraries.
SECTOR0
[64KB bootloader reserved]
[64KB statics]
SECTORn
[application references static]
…
Can I put a static crypto library at a specific location like this with CDT/GCC and with the managed builder?
I am considering this part of the “bootloader code” so I am assuming that I will at least need to modify the .ld file to create a new region?
The application will need to know the results from this linking, but to not include the code and place in memory itself - is this still a static link? There are maybe only three or four functions I want to access from application in the library.
(There are no MPU issues to deal with, I am confident I can jump to the static code area and back even though it’s in the first sector. This static area is intended to be execute only. I am confident only my application software can be installed and executed in the flash region)
I know that dynamic linking are smaller on disk but do they use more RAM at run time. Why if so?
The answer is "it depends how you measure it", and also "it depends which platform you're running on".
Static linking uses less runtime RAM, because for dynamic linking the entire shared object needs to be loaded into memory (I'll be qualifying this statement in a second), whilst with static linking only those functions you actually need are loaded.
The above statement isn't 100% accurate. Only the shared object pages that actually contain code you use are loaded. This is still much less efficient than statically linking, which compresses those functions together.
On the other hand, dynamic linking uses much less runtime RAM, as all programs using the same shared object use the same in RAM copy of the code (I'll be qualifying this statement in a second).
The above is a true statement on Unix like systems. On Windows, it is not 100% accurate. On Windows (at least on 32bit Intel, I'm not sure about other platforms), DLLs are not compiled with position independent code. As such, each DLL carries the (virtual memory) load address it needs to be loaded at. If one executable links two DLLs that overlap, the loader will relocate one of the DLLs. This requires patching the actual code of the DLL, which means that this DLL now carries code that is specific to this program's use of it, and cannot be shared. Such collisions, however, should be rare, and are usually avoidable.
To illustrate with an example, statically linking glibc will probably cause you to consume more RAM at run time, as this library is, in all likelihood, already loaded in RAM before your program even starts. Statically linking some unique library only your program uses will save run time RAM. The in-between cases are in-between.
Different processes calling the same dll/so file can share the read-only memory pages this includes code or text pages.
However each dll loaded in a given peogram has to have its own page for writable global or static data. These pages may be 4/16/64k or bigger depending on the OS. If one statically linked, the static data can be shared in one pages.
Programs, when running on common operating systems like Linux, Windows, MacOSX, Android, ...., are running as processes having some virtual address space. This uses virtual memory (implemented by the kernel driving the MMU).
Read a good book like Operating Systems: Three Easy Pieces to understand more.
So programs don't consume directly RAM. The RAM is a resource managed by the kernel. When RAM becomes scarce, your system experiments thrashing. Read also about the page cache and about memory overcommitment (a feature that I dislike and that I often disable).
The advantage of using a shared library, when the same library is used by several processes, is that its code segment is appearing (technically is paged) only once in RAM.
However, dynamic linking has a small overhead (even in memory), e.g. to resolve relocations. So if a library is used by only one process, that might consume slightly more RAM than if it was statically linked. In practice you should not bother most of the time, and I recommend using dynamic linking systematically.
And in practice, for huge processes (such as your browser), the data and the heap consumes much more RAM than the code.
On Linux, Drepper's paper How To Write Shared Libraries explains a lot of things in details.
On Linux, you might use proc(5) and pmap(1) to explore virtual address spaces. For example, try cat /proc/$$/maps and cat /proc/self/maps and pmap $$ in a terminal. Use ldd(1) to find out the dynamic libraries dependencies of a program, e.g. ldd /bin/cat. Use strace(1) to find out what syscalls(2) are used by a process. Those relevant to the virtual address space include mmap(2) and munmap, mprotect(2), mlock(2), the old sbrk(2) -obsolete- and execve(2).
For Embedded systems where the program runs independently on a micro-controller :
Are the programs always static linked ? or in certain occasions it may be dynamic linked ?
From Wikipedia:
a dynamic linker is the part of an operating system that loads and
links the shared libraries needed by an executable when it is executed
(at "run time"), by copying the content of libraries from persistent
storage to RAM, and filling jump tables and relocating pointers.
So it implies that dynamic linking is possible only if:
1) You have a some kind of OS
2) You have some kind of persistent storage / file system.
On a bare-metal micros it is usually not the case.
Simply put: if there is a full-grown operation system like Linux running on the microcontroller, then dynamic linking is possible (and common).
Without such an OS, you very, very likely use static linking. For this the linker will (basically) not only link the modules and libraries, but also includes the functions which are done by the OS program loader.
Lets stay at these (smaller) embedded systems for now.
Apart from static or dynamic linking, the linker also does relocation. This does - simply put - change internal (relative) addresses of the program to the absolute addresses on the target device.
It is not common on simple embedded systems primarily because it is neither necessary nor supported by the operating system (if any). Dynamic linking implies a certain amount of runtime operating system support.
The embedded systems RTOS VxWorks supports dynamic linking in the sense that it can load and link partially linked object code from a network or file system at runtime. Similarly Larger embedded RTOSs such as QNX support dynamic linking, as does embedded Linux.
So yes large embedded systems may support dynamic linking. In many cases it is used primarily as a means to link LGPL licensed code to a closed source application. It can also be used as a means of simplifying and minimising the impact of deploying changes and updates to large systems.
How can one make dynamic loadable tasks with an RTOS for an embedded systems.
The dynamics tasks are not created statically but are rather left as relocatables in the final elf image. Then at run time, the code for these tasks is loaded from some memory locations and then run from there.
In short, I dont want static tasks, but rather dynamic tasks in an Embedded System. Can someone guide me how to do that.
EDIT: I would like to know with a bare metal OS like FreeRTOS.
Thanks
We have a very modular application with a lot of shared objects (.so). Some people argue that on low-end platforms with limited memory/flash, it is better to statically link everything into one big executable as shared objects have overhead.
What is your opinion on this ?
Best Regards,
Paul
The costs of shared libraries are roughly (per library):
At least 4k of private dirty memory.
At least 12k of virtual address space.
Several filesystem access syscalls, mmap and mprotect syscalls, and at least one page fault at load time.
Time to resolve symbol references in the library code.
Plus position-independent code costs:
Loss of one general-purpose register (this can be huge on x86 (32bit) but mostly irrelevant on other archs).
Extra level of indirection accessing global/static variables (and constants).
If you have a large long-running application, the costs may not matter to you at all, unless you're on a tiny embedded system. On the other hand, if you're writing something that may be invoked many times for short tasks (like a language interpreter) these costs can be huge. Putting all of the standard modules in their own .so files rather than static linking them by default is a huge part of why Perl, Python, etc. are so slow to start.
Personally, I would go with the strategy of using dynamic loaded modules as a extensibility tool, not as a development model.
Unless memory is extremely tight, the size of one copy of these files is not the primary determining factor. Given that this is an embedded system, you probably have a good idea of what applications will be using your libraries and when. If your application opens and closes the multiple libraries it references dutifully, and you never have all the libraries open simultaneously, then the shared library will be a significant savings in RAM.
The other factor you need to consider is the performance penalty. Opening a shared library takes a small (usually trivial) amount of time; if you have a very slow processor or hard-to-hit real-time requirements, the static library will not incur the loading penalty of the shared library. Profile to find whether this is significant or not.
To sum up, shared libraries can be significantly better than static libraries in some special cases. In most cases, they do little to no harm. In simple situations, you get no benefit from shared libraries.
Of course, the shared library will be a significant savings in Flash if you have multiple applications (or versions of your application) which use the same library. If you use a static library, one copy (which is about the same size as the shared library[1]) will be compiled into each. This is useful when you're on a PC workstation. But you knew that. You're working with a library which is only used by one application.
[1] The memory difference of the individual library files is small. Shared libraries add an index and symbol table so that dlopen(3) can load the library. Whether or not this is significant will depend on your use case; compile for each and then compare the sizes to determine which is smaller in Flash. You'll have to run and profile to determine which consumes more RAM; they should be similar except for the initial loading of the shared library.
Having lots of libraries of course means more meta-data must be stored, and also some of that meta-data (library section headers etc.) will need to be stored in RAM when loaded. But the difference should be pretty negligible, even on (moderately modern) embedded systems.
I suggest you try both alternatives, and measure the used space in both FLASH and RAM, and then decide which is best.