ALIGN in Linker Scripts

ALIGN in Linker Scripts - linker

What does the ALIGN keyword do in linker scripts? I read many tutorials about linker scripts but I cant understand what really ALIGN do. Can any one explain it simply. Thanks!

A typical usage is
. = ALIGN(8);
This means: insert padding bytes until current location becomes aligned on 8-byte boundary. That is:
while ((current_location & 7) != 0)
*current_location++ = padding_value;

The ALIGN() instructions tell the linker that section(bss, text) should be this much aligned.
For a typical idea, you can take a look here (4.6.3 "Output Section Description")
e.g.
//.data is aligned by word size on the 32-bit architecture and direct it to the data section
For a 32-bit machine, it typically needs to be word aligned
.data : ALIGN(4)
{
*(.data*)
} > data_sdram

. = ALIGN(8)
Corresponds to the following (working link script example using operators):
data = .;
. = ((data + 0x8 - 1) & ~(0x8 - 1)) - data;

There are two typical uses for ALIGN.
To align the start of a section to a required boundary
To pad a section to a required size.
/* . = ALIGN(BEGIN) ; equivalent to below, if VMA==LMA */
.section : ALIGN(BEGIN)
{
/* . = ALIGN(BEGIN) ; !!Not equivalent to above!! */
...
. = ALIGN(END);
}
The first ALIGN ensure the starting value is a multiple of BEGIN as a power of 2. The 2nd ALIGN will ensure that the size of a multiple of END. Normally you will want them to be equal and powers of two which indicate how many address bits are needed and can help with caching.
For instance copy loops can be optimized with large transfer sizes if the size is prohibited by the linker script to have fixed size multiples.
The ALIGN(BEGIN) inside the '.section' would result in some junk at the beginning of the sections. See: FILL() and =fill for ways to control what is written. See below on reason why not to use this; it actually does nothing.
Some example of structure that need to be aligned are,
a vector table
an mmu table
time critical routines that need cache fills
.bss and .initdata for initialization efficiency
The Gnu ld syntax allows you the ability to use ALIGN anywhere.
If you were to custom code structures with BYTE, LONG, you might need to align tables/structures in a section. This is fairly obtuse use of a linker scripts, but is possible and an exception to the opening two uses. However, for most beginning to understand ld, that two uses are almost always the desired use.
Operation inside a section works as if all of the addresses are relative. So the start of the section is zero (and the not equivalent doesn't work). As the last thing is . = ALIGN(...);, this sets the size of the section because it is zero based. This will even work if the start address of the section is not aligned.

Related

Relocation of data from flash to RAM during boot phase

I'm currently trying to solve a problem which requires moving data from flash to RAM during the booting phase. Right now everything is only being simulated using a microcontroller architecture which is based on the open-source PULPissimo. For simulation I use QuestaSim by Mentor Graphics. Toolchain is GNU.
Unfortunately I have pretty much zero experience on how to relocate data during the boot phase so I've read some posts and tutorials on this topic but I'm still confused about quite a few thing.
The situation is as follows: I set my boot mode to boot from flash which in this case means that the code will already reside pre-loaded inside the flash memory. The code is just a simply hello world or any other program really. When I simulate everything is compiled and the modules are loaded. After the boot phase the output "hello world" is displayed and the simulation is done. This means everything works as intended which is obviously a good sign and a good starting point.
Side note: As far as i know the PULPissimo architecture does not support direct boot from flash at the moment so the data from flash has to be moved to RAM (which they call L2) and executed.
From what I understand there are multiple things involved in the booting process. Please correct me if anything in the next paragraph is wrong:
First: The code that will be executed. It's written in C and has to be translated into a language which the architecture understands. This should be done automatically and reside in the flash memory pre boot phase. Considering that the code is actually being executed as mentioned above there is not much confusion here.
Second: The bootloader. This is also written in C. It is also translated and will be burned into ROM later on so changing this wouldn't make much sense. It loads the data which is neccessary for booting. It can also differentiate if you want to boot from flash or JTAG.
Third: The main startup file crt0.S. This is one of the things that confuse me, especially what it exactly does and what the difference between the bootloader and the main startup file is. Wikipedia (yes i know...) defines it as: "crt0 (also known as c0) is a set of execution startup routines linked into a C program that performs any initialization work required before calling the program's main function." So does that mean that it has noting to do with the boot phase but instead kind of "initializes" and/or loads only the code that I want to execute?
Fourth: The linker script link.ld. Even tho this is the part I read the most about, there are still quite a lot of questions. From what I understand the linker script contains information on where to relocate data. The data that is to be relocated is the data of the code i want to execute(?). It consists of different parts explained here.
.text program code;
.rodata read-only data;
.data read-write initialized data;
.bss read-write zero initialized data.
Sometimes I see more than those sections, not just text, rodata, data, bss. But how does the linker script know what the "text" is and what the "data" is and so on?
I know that's quite a lot and probably pretty basic stuff for a lot of you but I'm genuinely confused.
What I am trying to accomplish is relocating data from flash to RAM during the boot phase. Not only the code that I want to execute but more data that is also located in the flash memory. Consider the following simple scenario: I want to run a hello world C program. I want to boot from flash. Up to this point nothing special and everything works fine. Now after the data of the code I also load more data into flash, let's say 256 bytes of A (hex) so I can check my memory in QuestaSim by looking for AAAAAAAA sections. I also want to say where I want that data to be loaded during boot phase, for example 0x1C002000. I tried playing around with the crt0.S and the linker.ld files but with no success. The only time it actually worked was when I modified the bootloader.c file but I have to assume that this is already burned into ROM and i can't do any modifications on it. To be honest I'm not even sure if what I'm trying to do is even possible without any changes to the bootloader.c.
Thank you for your time.
Update
So I was playing around a bit and tried to create a simple example to understand what's happening and what manipulations or relocations I can do.
First I created a C file which basically contains only data.
Lets call it my_test_data.c
int normal_arr[] = {0x55555555, 0x55555555, 0x55555555, 0x55555555, 0x55555555, 0x55555555, 0x55555555, 0x55555555};
int attribute_arr[] __attribute__ ((section(".my_test_section"))) = {0x66666666, 0x66666666, 0x66666666, 0x66666666, 0x66666666, 0x66666666, 0x66666666, 0x66666666};
static int static_arr[] = {0x77777777, 0x77777777, 0x77777777, 0x77777777, 0x77777777, 0x77777777, 0x77777777, 0x77777777};
int normal_var = 0xCCCCCCCC;
static int static_var = 0xDDDDDDDD;
int result_var;
Then I created the object file. I looked into it via objdump and could see my section my_test_section :
4 .my_test_section 00000020 00000000 00000000 00000054 2**2
After that I tried to modify my linker script so that this section would be loaded to an address that I specified. These are the lines I added in the linker script (probably more than needed). It is not the whole linker script!:
CUT01 : ORIGIN = 0x1c020000, LENGTH = 0x1000
.my_test_section : {
. = ALIGN(4);
KEEP(*(.my_test_section))
_smytest = .;
*(.my_test_section)
*(.my_test_section.*)
_endmytest = .;
} > CUT01
I wanted to see what data from my_test_data.c gets moved and where it gets moved. Remember that my goal is to have the data inside the RAM (Addr.: 0x1c020000) after booting (or during booting however you prefer). Unfortunately only:
int normal_arr[] = {0x55555555, 0x55555555, 0x55555555, 0x55555555, 0x55555555, 0x55555555, 0x55555555, 0x55555555};
gets moved into ROM (Addr.: 0x1A000000) as it seems to be part of the .text section (iirc) which is already being handled by the linker script:
.text : {
. = ALIGN(4);
KEEP(*(.vectors))
_stext = .;
*(.text)
*(.text.*)
_etext = .;
*(.lit)
( ... more entries ...)
_endtext = .;
} > ROM
What also confuses me is the fact that I can add this line in the above .text section:
*(.my_test_section)
and then the data from the attribute_arr will be located in ROM but if I try to move it to the address I added (CUT01) nothing will ever end up there.
I also generated the map file which also lists my_test_section. This is an excerpt from the map file (don't mind the locations of where the output files are on my machine).
.my_test_section
0x000000001c020000 0x3c
0x000000001c020000 _mts_start = .
*(.text)
*(.text.*)
*(.comment)
.comment 0x000000001c020000 0x1a /.../bootloader.o
0x1b (size before relaxing)
.comment 0x000000001c02001a 0x1b /.../my_test_data.o
*(.comment.*)
*(.rodata)
*(.rodata.*)
*(.data)
*(.data.*)
*(.my_test_section)
*fill* 0x000000001c02001a 0x2
.my_test_section
0x000000001c02001c 0x20 /.../my_test_data.o
0x000000001c02001c attribute_arr
*(.my_test_section.*)
*(.bss)
*(.bss.*)
*(.sbss)
*(.sbss.*)
0x000000001c02003c . = ALIGN (0x4)
0x000000001c02003c _mts_end = .
OUTPUT(/.../bootloader elf32-littleriscv)
I will continue to try to get this to work but right now I'm kind of confused as to why it seems like my_test_section gets recognized but not moved to the location which I specified. This makes me wonder if I made a mistake (or several mistakes) in the linker script or if one of the other files (bootloader.c or crt0.S) might be the reason.

There is a lot being asked here. I'm going to take a stab at answering part of the questions. you ask:
But how does the linker script know what the "text" is and what the
"data" is and so on?
The additional, custom, sections, and the predefined sections, are handled differently.
Custom sections usually require the related variables to have the section specified with a pragma.
The standard sections are defined by their type:
text: this is the code. that should be clear; the instructions to the computer of what to do, not the data
rodata: const data -- such as literal strings (eg. "This is a literal string" in the code. A good compiler/linker should put variables defined as 'const' (not const parameters) in the rodata section as well.
bss: static or global variables which are not initialized when declared:
int global_var_not_a_good_idea; // not in a function; local variables are different
static int anUninitializedArray[10];
data: static or global variables which are initialized when declared
int initializedGlobalVarStillNotRecommended = 10;
static int initializedArray[] = { 1, 2, 3, 4, 5, 6};
This data should be copied to RAM when the program loads.
EDIT:
Somewhere in your startup code should be a reset handler. This function will be called on processor reset. It should be the function that copies data to RAM, possibly clears the zero segment, initializes the C library, etc. When finished with initializations, it should call main();
Here is an example (in this case, from generated or example code for the Atmel SAMG55 processor, but the idea should be the same) of relocating data to RAM.
In the linker script memory space definitions (I'm going to leave out the real numbers):
ram (rwx) : ORIGIN = 0x########, LENGTH = 0x########
in the linker script section definitions:
.relocate : AT (_etext)
{
. = ALIGN(4);
_srelocate = .;
(.ramfunc .ramfunc.);
(.data .data.);
. = ALIGN(4);
_erelocate = .;
} > ram
note that _etext is the end of the previous section
_srelocate and _erelocate are used in the startup code to relocate, I believe, everything in .data (and, apparently, .ramfunc as well) in all the files:
/* Initialize the relocate segment */
pSrc = &_etext;
pDest = &_srelocate;
if (pSrc != pDest) {
for (; pDest < &_erelocate;) {
*pDest++ = *pSrc++;
}
}
This is a pretty standard example. If you search in your project for where main() is called, you should find something similar.
If all you want to do is relocate the entire .data section to the address you are specifying in RAM, you should need only to change the definition of the location of the RAM section, not define your own. You only need to define your own section if you want to move specific variables to a different location
I am not familiar with the platform on which you are working, but there should be either a C or assembly file with the startup code that runs before crt0. This will set up the stack, heap, and interrupt vectors. In some implementations, this code also copies the .data section to RAM, and may be set up to copy everything from the beginning of the data section until the beginning of the .bss section, to RAM. If your platform is set up in this way, if you locate your section between .data and .bss, it should be copied with no other changes from you (see here, for example).
If, however, you want to copy the data to a different location, you will probably have to add code to copy it, either in the loader code or at the very beginning of main, using the symbols you defined for the beginning and ending of the section.
Since you mention, though, that it is read-only data, I would recommend leaving it in read-only memory if you can.

Is accessing the "value" of a linker script variable undefined behavior in C?

The GNU ld (linker script) manual Section 3.5.5 Source Code Reference has some really important information on how to access linker script "variables" (which are actually just integer addresses) in C source code. I used this info. to extensively use linker script variables, and I wrote this answer here: How to get value of variable defined in ld linker script from C.
However, it is easy to do it wrong and make the mistake of trying to access a linker script variable's value (mistakenly) instead of its address, since this is a bit esoteric. The manual (link above) says:
This means that you cannot access the
value of a linker script defined symbol - it has no value - all you can do is access the address of a linker script defined symbol.
Hence when you are using a linker script defined symbol in source code you should always take the address of the symbol, and never attempt to use its value.
The question: So, if you do attempt to access a linker script variable's value, is this "undefined behavior"?
Quick refresher:
Imagine in linker script (ex: STM32F103RBTx_FLASH.ld) you have:
/* Specify the memory areas */
MEMORY
{
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 128K
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 20K
}
/* Some custom variables (addresses) I intend to access from my C source code */
__flash_start__ = ORIGIN(FLASH);
__flash_end__ = ORIGIN(FLASH) + LENGTH(FLASH);
__ram_start__ = ORIGIN(RAM);
__ram_end__ = ORIGIN(RAM) + LENGTH(RAM);
And in your C source code you do:
// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)&__flash_start__);
// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)__flash_start__);
// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", __flash_start__);
Sample printed output
(this is real output: it was actually compiled, run, and printed by an STM32 mcu):
__flash_start__ addr = 0x8000000
__flash_start__ addr = 0x8000000
__flash_start__ addr = 0x20080000 <== NOTICE LIKE I SAID ABOVE: this one is completely wrong (even though it compiles and runs)! <== Update Mar. 2020: actually, see my answer, this is just fine and right too, it just does something different is all.
Update:
Response to #Eric Postpischil's 1st comment:
The C standard does not define anything at all about linker script symbols. Any specification of behavior is up to the GNU tools. That said, if a linker script symbol identifies a place in memory where some valid object is stored, I would expect accessing the value of that object to work, if it were accessed with its proper type. Supposing flash_start is normally accessible memory, and except for any requirements of your system about what is at flash_start, you could, in theory, put a uint32_t (using appropriate input to the linker) and then access it via flash_start.
Yes, but that's not my question. I'm not sure if you're picking up the subtlety of my question. Take a look at the examples I provide. It is true you can access this location just fine, but make sure you understand how you do so, and then my question will become apparent. Look especially at example 3 above, which is wrong even though to a C programmer it looks right. To read a uint32_t, for ex, at __flash_start__, you'd do this:
extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)&__flash_start__); // correct, even though it *looks like* you're taking the address (&) of an address (__flash_start__)
OR this:
extern uint32_t __flash_start__[];
uint32_t u32 = *((uint32_t *)__flash_start__); // also correct, and my preferred way of doing it because it looks more correct to the trained "C-programmer" eye
But most definitely NOT this:
extern uint32_t __flash_start__;
uint32_t u32 = __flash_start__; // incorrect; <==UPDATE: THIS IS ALSO CORRECT! (and more straight-forward too, actually; see comment discussion under this question)
and NOT this:
extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)__flash_start__); // incorrect, but *looks* right
Related:
Why do STM32 gcc linker scripts automatically discard all input sections from these standard libraries: libc.a, libm.a, libgcc.a?
[My answer] How to get value of variable defined in ld linker script from C

Shorter answer:
Accessing the "value" of a linker script variable is NOT undefined behavior, and is fine to do, so long as you want the actual data stored at that location in memory and not the address of that memory or the "value" of a linkerscript variable which happens to be seen by C code as an address in memory only and not a value.
Yeah, that's kind of confusing, so re-read that 3 times carefully. Essentially, if you want to access the value of a linker script variable just ensure your linker script is set up to prevent anything you don't want from ending up in that memory address so that whatever you DO want there is in fact there. This way, reading the value at that memory address will provide you something useful you expect to be there.
BUT, if you're using linker script variables to store some sort of "values" in and of themselves, the way to grab the "values" of these linker script variables in C is to read their addresses, because the "value" you assign to a variable in a linker script IS SEEN BY THE C COMPILER AS THE "ADDRESS" of that linker script variable, since linker scripts are designed to manipulate memory and memory addresses, NOT traditional C variables.
Here's some really valuable and correct comments under my question which I think are worth posting in this answer so they never get lost. Please go upvote his comments under my question above.
The C standard does not define anything at all about linker script symbols. Any specification of behavior is up to the GNU tools. That said, if a linker script symbol identifies a place in memory where some valid object is stored, I would expect accessing the value of that object to work, if it were accessed with its proper type. Supposing __flash_start__ is normally accessible memory, and except for any requirements of your system about what is at __flash_start__, you could, in theory, put a uint32_t (using appropriate input to the linker) and then access it via __flash_start__.
– Eric Postpischil
That documentation is not written very well, and you are taking the first sentence too literally. What is really happening here is that the linker’s notion of the “value” of a symbol and a programming language’s notion of the “value” of an identifier are different things. To the linker, the value of a symbol is simply a number associated with it. In a programming language, the value is a number (or other element in the set of values of some type) stored in the (sometimes notional) storage associated with the identifier. The documentation is advising you that the linker’s value of a symbol appears inside a language like C as the address associated with the identifier, rather than the contents of its storage...
THIS PART IS REALLY IMPORTANT and we should get the GNU linker script manual updated:
It goes too far when it tells you to “never attempt to use its value.”
It is correct that merely defining a linker symbol does not reserve the necessary storage for a programming language object, and therefore merely having a linker symbol does not provide you storage you can access. However if you ensure storage is allocated by some other means, then, sure, it can work as a programming language object. There is no general prohibition on using a linker symbol as an identifier in C, including accessing its C value, if you have properly allocated storage and otherwise satisfied the requirements for this. If the linker value of __flash_start__ is a valid memory address, and you have ensure there is storage for a uint32_t at that address, and it is a properly aligned address for a uint32_t, then it is okay to access __flash_start__ in C as if it were a uint32_t. That would not be defined by the C standard, but by the GNU tools.
– Eric Postpischil
Long answer:
I said in the question:
// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)&__flash_start__);
// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)__flash_start__);
// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", __flash_start__);
(See discussion under the question for how I came to this).
Looking specifically at #3 above:
Well, actually, if your goal is to read the address of __flash_start__, which is 0x8000000 in this case, then yes, this is completely wrong. But, it is NOT undefined behavior! What it is actually doing, instead, is reading the contents (value) of that address (0x8000000) as a uint32_t type. In other words, it's simply reading the first 4 bytes of the FLASH section, and interpreting them as a uint32_t. The contents (uint32_t value at this address) just so happen to be 0x20080000 in this case.
To further prove this point, the following are exactly identical:
// Read the actual *contents* of the `__flash_start__` address as a 4-byte value!
// forward declaration to make a variable defined in the linker script
// accessible in the C code
extern uint32_t __flash_start__;
// These 2 read techniques do the exact same thing.
uint32_t u32_1 = __flash_start__; // technique 1
uint32_t u32_2 = *((uint32_t *)&__flash_start__); // technique 2
printf("u32_1 = 0x%lX\n", u32_1);
printf("u32_2 = 0x%lX\n", u32_2);
The output is:
u32_1 = 0x20080000
u32_2 = 0x20080000
Notice they produce the same result. They each are producing a valid uint32_t-type value which is stored at address 0x8000000.
It just so turns out, however, that the u32_1 technique shown above is a more straight-forward and direct way of reading the value is all, and again, is not undefined behavior. Rather, it is correctly reading the value (contents of) that address.
I seem to be talking in circles. Anyway, mind blown, but I get it now. I was convinced before I was supposed to use the u32_2 technique shown above only, but it turns out they are both just fine, and again, the u32_1 technique is clearly more straight-forward (there I go talking in circles again). :)
Cheers.
Digging deeper: Where did the 0x20080000 value stored right at the start of my FLASH memory come from?
One more little tidbit. I actually ran this test code on an STM32F777 mcu, which has 512KiB of RAM. Since RAM starts at address 0x20000000, this means that 0x20000000 + 512K = 0x20080000. This just so happens to also be the contents of the RAM at address zero because Programming Manual PM0253 Rev 4, pg. 42, "Figure 10. Vector table" shows that the first 4 bytes of the Vector Table contain the "Initial SP [Stack Pointer] value". See here:
I know that the Vector Table sits right at the start of the program memory, which is located in Flash, so that means that 0x20080000 is my initial stack pointer value. This makes sense, because the Reset_Handler is the start of the program (and its vector just so happens to be the 2nd 4-byte value at the start of the Vector Table, by the way), and the first thing it does, as shown in my "startup_stm32f777xx.s" startup assembly file, is set the stack pointer (sp) to _estack:
Reset_Handler:
ldr sp, =_estack /* set stack pointer */
Furthermore, _estack is defined in my linker script as follows:
/* Highest address of the user mode stack */
_estack = ORIGIN(RAM) + LENGTH(RAM); /* end of RAM */
So there you have it! The first 4-byte value in my Vector Table, right at the start of Flash, is set to be the initial stack pointer value, which is defined as _estack right in my linker script file, and _estack is the address at the end of my RAM, which is 0x20000000 + 512K = 0x20080000. So, it all makes sense! I've just proven I read the right value!
See also:
[my answer] How to get value of variable defined in ld linker script from C

How can i extract constants' addresses,added by compiler optimization, from ELF file?

I'm writing some code size analysis tool for my C program, using the output ELF file.
I'm using readelf -debug-dump=info to generate Dwarf format file.
I've noticed that My compiler is adding as a part of the optimization new consts, that are not in Dwarf file, to the .rodata section.
So .rodata section size includes their sizes but i don't have their sizes in Dwarf.
Here is an example fro map file:
*(.rodata)
.rodata 0x10010000 0xc0 /<.o file0 path>
0x10010000 const1
0x10010040 const2
.rodata 0x100100c0 0xa /<.o file1 path>
fill 0x100100ca 0x6
.rodata 0x100100d0 0x6c /<.o file2 path>
0x100100d0 const3
0x100100e0 const4
0x10010100 const5
0x10010120 const6
fill 0x1001013c 0x4
In file1 above, although i didn't declare on const variable - the compiler does, this const is taking space in .rodata yet there is no symbol/name for it.
Here is the code inside some function that generates it:
uint8 arr[3][2] = {{146,179},
{133, 166},
{108, 141}} ;
So compiler add some consts values to optimize the load to the array.
How can i extract theses hidden additions from data sections?
I want to be able to fully characterize my code - How much space is used in each file, etc...

I am guessing here - it will be linker dependent, but when you have code such as:
uint8 arr[3][2] = {{146,179},
{133, 166},
{108, 141}} ;
arr at run-time exists in r/w memory, but its initialiser will be located in R/O memory to be copied to the R/W memory when the array is initialised. The linker need only provide the address, because the size will be known locally as a compile-time constant embedded as a literal in the initializing code. Consequently the size information does not appear in the map, because the linker discards that information.
Length is however implicit by the address of adjacent objects for filled space. So for example:
The size of const1 for example is equal to const2 - const1 and for const6 it is 0x1001013c - const6.
It is all rather academic however - you have precise control over this in terms of the size of your constant initialisers. They are not magically created data unrelated to your code, and I am not convinced that thy are a product of optimization as you suggest. The non-zero initialisers must exist regardless of optimisation options, and in any case optimisation primarily affects the size and/or speed of code (.text) rather then data. The impact on data sizes is likely to relate only to padding and alignment and in debug builds possibly "guard-space" for overrun detection.
However there is no need at all for you to guess. You can determine how this data is used by inspecting the disassembly or observing its execution (at the instruction level) in a debugger - to see exactly where initialised variables are copying the data from. You could even place an read-access break-point at these addresses and you will determine directly what code is utilizing them.

to get the size of elf file in details use
"You can use nm and size to get the size of functions and ELF sections.
To get the size of the functions (and objects with static storage duration):
$ nm --print-size --size-sort --radix=d tst.o
The second column shows the size in decimal of function and objects.
To get the size of the sections:
$ size -A -d tst.o
The second column shows the size in decimal of the sections."
Tool to analyze size of ELF sections and symbol

Understanding certain ELF file structure

From ARM's infocenter, regarding section static linking and relocations:
** Section #1 'ER_RO' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]
Size : 28 bytes (alignment 4)
Address: 0x00008000
$a
.text
bar
0x00008000: E59f000C .... LDR r0,[pc,#12] ; [0x8014] = 0x801C
0x00008004: E5901000 .... LDR r1,[r0,#0]
0x00008008: E2411001 ..A. SUB r1,r1,#1
0x0000800C: E5801000 .... STR r1,[r0,#0]
0x00008010: E12FFF1E ../. BX lr
$d
0x00008014: 0000801C .... DCD 32796
$a
.text
foo
0x00008018: EAFFFFF8 .... B bar ; 0x8000
and from ELF for the ARM architecture:
Table 4-7, Mapping symbols
Name Meaning
$a - Start of a sequence of ARM instructions
$d - Start of a sequence of data items (for example, a literal pool)
As you can see, the ELF file contains a section in which there is code (bar), then data/ro (32796), then more code (foo) in consecutive addresses.
Now, a basic principle regarding any SW file structure is that the SW is composed from different and separate sections - text (code), data, and bss. (and rodata if we want to be pedantic) as we can see if we examine the MAP file.
So, this ELF structure is not consistent with this basic principle, so my question is what is going on here? am I mistaking in this basic principle? if not, than is this ELF structure will be changed in run time to meet the sections separation?
and why is the ELF section contains mixed types in a certain sequential address space?
NOTE: I assume the scatter file used in the example is the default one since the document contains the example do not provide any scatter file along with the example.

At run time, the sections do not matter, only the PT_LOAD segments in the program header. The ELF specification is quite flexible there as well, but some loaders have restrictions on the PT_LOAD segments they can process.
The reason for splitting code and data this way could be that this architecture supports only a limited range of PC-relative addressing and needs a constant pool for loading most constants (because constructing them via immediates is too expensive). Having as few large constants pools as possible is attractive because it leads to improved data and instruction cache utilization (instead of caching memory which is not of the right type and this can never be used), but you may still need more than one if the code size exceeds what can be addressed directly.

align all object files in data/sbss section in linker script

EDIT: Solved - the linker script property "SUBALIGN(32)" applied to the static data sections does exactly what I required, forcing each object file linked to be aligned to a 32byte boundary, with padding automatically inserted.
__bss_start = .;
.bss :
SUBALIGN(32)
{
*(.dynbss)
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
SORT(CONSTRUCTORS)
. = ALIGN(32);
} = 0
. = ALIGN(32);
I am building a multiprogram benchmark on a cache-incoherent architecture, comprised of multiple instances of the EEMBC suite renamed and linked together.
The problem is that the libraries are not cache line aligned in the writable data segments, and I am getting data corruption here (evidenced by cache line thrashing in a coherent simulation).
For example cache line at 0x7500 is being shared between the cores operating on Viterb0 and Viterb1, the map output indicates that this is where library 0 is running into the cache line that library1 starts in:
...
.bss 0x000068e8 0xc24 ../EEMBClib/libmark_0.a(renamed_renamed_viterb00_viterb00.o)
.bss 0x0000750c 0x4 ../EEMBClib/libmark_1.a(renamed_renamed_viterb00_bmark_lite.o)
...
I need to align every object file linked in the various data segments to 32byte boundaries, I only know how to align the whole section, the current .bss sections is:
__bss_start = .;
.bss :
{
*(.dynbss)
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
SORT(CONSTRUCTORS)
. = ALIGN(32);
} = 0
. = ALIGN(32);
Any help would be greatly appreciated here, rebuilding the libraries with padding isn't really an option I want to consider yet as I would like this more robust solution for future linking purposes on this platform.

The solution is the linker script property "SUBALIGN(32)". When applied to the static data sections this does exactly what I required, forcing each object file linked to be aligned to a 32byte boundary, with padding automatically inserted.
__bss_start = .;
.bss :
SUBALIGN(32)
{
*(.bss .bss.* .gnu.linkonce.b.*)
} = 0
. = ALIGN(32);
gives the fixed result
.bss 0x00006940 0xc24 ../EEMBClib/libmark_0.a(renamed_renamed_viterb00_viterb00.o)
fill 0x00007564 0x1c 00000000
.bss 0x00007580 0x4 ../EEMBClib/libmark_1.a(renamed_renamed_viterb00_bmark_lite.o)
instead of
.bss 0x000068e8 0xc24 ../EEMBClib/libmark_0.a(renamed_renamed_viterb00_viterb00.o)
.bss 0x0000750c 0x4 ../EEMBClib/libmark_1.a(renamed_renamed_viterb00_bmark_lite.o)

(Apologies that this is at least currently more a collection of thoughts than a concrete answer, but it's going to be a bit long to post in comments)
Probably the first thing that would be worth doing is to come up with some verification routine that parses objdump/readelf output to verify if your alignment requirement has been met, and put this into your build process as a check. If you can't do it at compile time, at least do it as a run time check.
Then some paths of achieving the alignment could be investigated.
Assume for a minute that a custom section is created and all data with this requirement is placed there with pragmas in the source code. Something to look into would then be if the linker is willing to honor the section alignment setting given in the occurrence of that section in each object file. You could for example hexedit one of the objects to increase that alignment and use your dump processor to see what happens. If this works out, great - it seems like the proper way to handle the task, and hopefully there's a reasonable way to specify the alignment size requirement for that section which will end up in the object files.
Another idea would be to attempt some sort of scripted allocation adjustment. For example, use objcopy to join all the applicable sections into one file, while stripping them out of the others. Analyze the file and figure out what allocations you want, then use objcopy or a custom elf modification program to set that. Maybe you could even make this modification to the fully linked result, at least if you have your linker script put the special section at the end, so that you don't have to move other allocations out of its way when you grow it to achieve internal alignment.
If you don't want to get into modifying elf's, another approach for doing your own auxiallary linking with a script could be to calculate the size of each object's data in the special section, then automatically generate an additional object file that simply pads that section out to the next alignment boundary. Your link stage would then specify objects in a list: program1.o padding1.o program2.o padding2.o
Or you could have each program put its special data in its own uniquely named linker section. Dump out the sizes of all of these, figure out where you want them to be, and then have the script create a customized linker script which explicitly puts the named sections in the just determined places.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight