Local and static variables in C (cont'd) - c

Building on my last question i'm trying to figure out how .local and .comm directives work exactly and in particular how they affect linkage and duration in C.
So I've run the following experiment:
static int value;
which produces the following assembly code (using gcc):
.local value
.comm value,4,4
When initialized to zero yields the same assembly code (using gcc):
.local value
.comm value,4,4
This sounds logical because in both cases i would expect that the variable will be stored in the bss segment. Moreover, after investigating using ld --verbose it looks that all .comm variables are indeed placed in the bss segment:
.bss :
{
*(.dynbss)
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
// ...
}
When i initialize however my variable to a value other than zero, the compiler defines the variable in the data segment as i would expected, but produces the following output:
.data
.align 4
.type value, #object
.size value, 4
value:
.long 1
Besides the different segments (bss and data respectively) which thanks to your help previously i now understand, my variable has been defined as .local and .comm in the first example but not in the second. Could anyone explain that difference between the two outputs produced from each case?

The .local directive marks a symbol as a local, non-externally-visible symbol, and creates it if it doesn't already exist. It's necessary for 0-initialized local symbols, because .comm declares but does not define symbols. For the 1-initialized variant, the symbol itself (value:) declares the symbol.
Using .local and .comm is essentially a bit of a hack (or at least a shorthand); the alternative would be to place the symbol into .bss explicitly:
.bss
.align 4
.type value, #object
.size value, 4
value:
.zero 4

Linux kernel zeros the virtual memory of a process after allocation due to security reasons. So, the compiler already knows that the memory will be filled with zeros and does an optimization: if some variable is initialized to 0, there's no need to keep space for it in a executable file (.data section actually takes some space in ELF executable, whereas .bss section stores only its length assuming that its initial contents will be zeros).

Related

Stack initialisation in GNU ARM toolchain

Checking the startup file provided as an example in the GNU ARM toolchain, I couldnt understand one thing.
Code snippets provided here are taken from examples included in GNU ARM Embedded Toolchain files downloaded from official website. Code compiles and everything seems to be good.
I am wondering why they wrote this code exactly like that, why they are using same names for example?
I am wondering why my linker is not complaining about multiple definition error for __StackTop and __StackLimit. Here is the part of the file startup_ARMCM0.S
.syntax unified
.arch armv6-m
.section .stack
.align 3
#ifdef __STACK_SIZE
.equ Stack_Size, _*emphasized text*_STACK_SIZE
#else
.equ Stack_Size, 0xc00
#endif
.globl __StackTop
.globl __StackLimit
__StackLimit:
.space Stack_Size
.size __StackLimit, . - __StackLimit
__StackTop:
.size __StackTop, . - __StackTop
If the linker is defining the same symbols: __StackTop and __StackLimit.
.stack_dummy (COPY):
{
*(.stack*)
} > RAM
/* Set stack top to end of RAM, and stack limit move down by
* size of stack_dummy section */
__StackTop = ORIGIN(RAM) + LENGTH(RAM);
__StackLimit = __StackTop - SIZEOF(.stack_dummy);
PROVIDE(__stack = __StackTop);
While checking linker documentation, it was written that, given the example:
SECTIONS
{
.text :
{
*(.text)
_etext = .;
PROVIDE(etext = .);
}
}
In this example, if the program defines _etext (with a leading
underscore), the linker will give a multiple definition error. If, on
the other hand, the program defines etext (with no leading
underscore), the linker will silently use the definition in
the program. If the program references etext but does not define
it, the linker will use the definition in the linker script.
Also, when using readelf -s just to check symbols generated from assembly file startup_ARMCM0.S without linking, I can see the symbol __StackTop and __StackLimit with one values. While, after linking they have the values set up by the linker (keeping in mind that the value of the linker is actually stored in address of the symbol).

Unable to access correct global label data of assembly from C in linux

I have an assembly code (hello1.s) where global label A_Td is defined and I want to access all the long data values defined with global label A_Td from/inside C program.
.file "hello1.s"
.globl A_Td
.text
.align 64
A_Td:
.long 1353184337,1353184337
.long 1399144830,1399144830
.long 3282310938,3282310938
.long 2522752826,2522752826
.long 3412831035,3412831035
.long 4047871263,4047871263
.long 2874735276,2874735276
.long 2466505547,2466505547
As A_Td is defined in text section, so it is placed in code section and only one copy is loaded into memory.
Using yasm , I have generated hello1.o file
yasm -p gas -f elf32 hello1.s
Now, to access all the long data using global label A_Td , I have written following C code (test_glob.c) taking clue from here global label.
//test_glob.c
extern A_Td ;
int main()
{
long *p;
int i;
p=(long *)(&A_Td);
for(i=0;i<16;i++)
{
printf("p+%d %p %ld\n",i, p+i,*(p+i));
}
return 0;
}
Using following command I have compiled C program and then run the C code.
gcc hello1.o test_glob.c
./a.out
I am getting following output
p+0 0x8048400 1353184337
p+1 0x8048404 1353184337
p+2 0x8048408 1399144830
p+3 0x804840c 1399144830 -----> correct till this place
p+4 0x8048410 -1012656358 -----> incorrect value retrieved from this place
p+5 0x8048414 -1012656358
p+6 0x8048418 -1772214470
p+7 0x804841c -1772214470
p+8 0x8048420 -882136261
p+9 0x8048424 -882136261
p+10 0x8048428 -247096033
p+11 0x804842c -247096033
p+12 0x8048430 -1420232020
p+13 0x8048434 -1420232020
p+14 0x8048438 -1828461749
p+15 0x804843c -1828461749
ONLY first 4 long values are correctly accessed from C program. Why this is happening ?
What needs to be done inside C program to access the rest of data correctly ?
I am using Linux. Any help to resolve this issue or any link will be a great help. Thanks in advance.
How many bytes does "long" have in this system?
It seems to me that printf interprets the numbers as four byte signed integers, where the value 3282310938 has the hex value C3A4171A, which is above 7FFFFFFF (in decimal: 2147483647) which is the largest four byte positive signed number, and hence a negative value -1012656358.
I assume that the assembler just interprets these four byte numbers as unsigned.
If you would use %lu instead of %ld, printf would interpret the numbers as unsigned, and should show what you expected.

NULL terminator on string included via AS's incbin directive

I have some large string resources located in files that I include in my executable. I include them in the executable using the following. The *.S allows GCC to invoke as to produce the object file without any special processing.
;; ca_conf.S
.section .rodata
;; OpenSSL's CA configuration
.global ca_conf
.type ca_conf, #object
.align 8
ca_conf:
ca_conf_start:
.incbin "res/openssl-ca.cnf"
ca_conf_end:
.byte 0
;; The string's size (if needed)
.global ca_conf_size
.type ca_conf_size, #object
.align 4
ca_conf_size:
.int ca_conf_end - ca_conf_start
I add a .byte 0 after including the string to ensure the string is NULL terminated. That allows me to use ca_conf as a C const char*, or {ca_conf,ca_conf_size} as a C++ string.
Will the assembler or linker rearrange things such that the NULL terminator could become separated from the string its terminating? Or will the assembler and linker always keep them together?
Because you're in assembler they will be kept together.
One other point, because of the ALIGN 4 ca_conf_size may not be the length you are expecting, it can include upto 3 padding bytes.

Adding section to GNU linker script

Hi I am trying to define a custom section in my linker script in a following way:
.version_section(__custom_data__) :
{
KEEP (*version_info.o (.rodata* .data* .sdata*))
}
I am compiling a C file that contains a structure and I want to that structure be stored in this version_section all time.
version_info ver_info __attribute__ ((section(".version_section"))) = {7, 10, 2013, 17, 17, "some_type", "some_sw_version", "some_version"} ;
Now, till this stage everything works fine. But the so generated section has flags "AW" however I need flags to be "A".
So I am using an assembler file that defined this section to have "A" flag like this:
.section .version_section,"a", #progbits
.align 8
.globl __custom_data__
.type __custom_data__, #function
__custom_data__:
.word 0
.size __custom_data__, .-__custom_data__
.space (0x1024-0x4), 0
But I still see the the default flags to the version_section, ie. AW in readelf
[11] .version_section PROGBITS 00011088 004088 001044 00 WA 0 0 8
What am I doing wrong here?
It appears that "W" meant writable in readelf output, as I suspected. Adding the const qualifier to the definition of ver_info moved it to the desired segment in memory.

static variable storage

In C, where is static variable stored in memory? Suppose there are two static variables, one local to a function and the other global. How is this entry maintained in symbol table? Please explain.
In C, they can be stored wherever the implementation sees fit. The C standard does not dictate how the implementation does things, only how it behaves.
Typically, all static storage duration variables (statics within a function and all variables outside a function) will be stored in the same region, regardless of whether they at at file level or within a function.
That bit in parentheses above is important. Outside of a function, static doesn't decide the storage duration of a variable like it does within a function. It decides whether the variable is visible outside of the current translation unit. All variables outside of functions are static storage duration.
And, regarding the symbol table, that's a construct that exists only during the build process. Once an executable is generated, there are no symbols (debugging information excluded of course, but that has nothing to do with the execution of code). All references to variables at that point will almost certainly be hard-coded addresses or offsets.
In other words, it's the compiler that figures out which variable you're referring to with a name.
You can see an example here as to how the variables are stored. Consider the following little C program:
#include <stdio.h>
int var1;
static int var2;
int main (void) {
int var3;
static int var4;
var1 = 111;
var2 = 222;
var3 = 333;
var4 = 444;
return 0;
}
This generates the following assembly:
.file "qq.c"
.comm var1,4,4
.local var2
.comm var2,4,4
.text
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $111, var1
movl $222, var2
movl $333, -4(%ebp)
movl $444, var4.1705
movl $0, %eax
leave
ret
.size main, .-main
.local var4.1705
.comm var4.1705,4,4
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
And you can see that var1, var2 and var4 (the static storage duration ones) all have a .comm line to mark them as common entries, subject to consolidation by the linker.
In addition, var2, var3 and var4 (the ones that are invisible outside the current transdlation unit) all have a .local line, so that the linker won't use them for satisfying unresolved externals in other object file.
And, by examining the output of ld --verbose while linking a file, you can see that all common entries end up in the .bss area:
.bss :
{
*(.dynbss)
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
: : :
}
It's impossible to generalize to every compiler, but this is how it's most often done.
There will be a block of memory set aside by the linker for variables which are initialized at load time but modifiable at run time. All static variables will be placed in this block no matter if they are local or global.
Given the following source:
static int a_static_var = 5;
void foo(void)
{
static int a_static_var = 6;
return;
}
Visual Studio compiles the variable as follows (at lest in this instance - details will vary from compiler to compiler and depend on options):
_DATA SEGMENT
_a_static_var DD 05H
?a_static_var#?1??foo##9#9 DD 06H ; `foo'::`2'::a_static_var
_DATA ENDS
So both static variables end up in the data segment - the static that's scoped to a function has it's name mangled in such a way that it will not 'match up' with a similar variable ion a different function or source file.
Compiler implementations are free to handle this is whatever manner, but the general idea will usually be similar.

Resources