NULL terminator on string included via AS's incbin directive - c

I have some large string resources located in files that I include in my executable. I include them in the executable using the following. The *.S allows GCC to invoke as to produce the object file without any special processing.
;; ca_conf.S
.section .rodata
;; OpenSSL's CA configuration
.global ca_conf
.type ca_conf, #object
.align 8
ca_conf:
ca_conf_start:
.incbin "res/openssl-ca.cnf"
ca_conf_end:
.byte 0
;; The string's size (if needed)
.global ca_conf_size
.type ca_conf_size, #object
.align 4
ca_conf_size:
.int ca_conf_end - ca_conf_start
I add a .byte 0 after including the string to ensure the string is NULL terminated. That allows me to use ca_conf as a C const char*, or {ca_conf,ca_conf_size} as a C++ string.
Will the assembler or linker rearrange things such that the NULL terminator could become separated from the string its terminating? Or will the assembler and linker always keep them together?

Because you're in assembler they will be kept together.
One other point, because of the ALIGN 4 ca_conf_size may not be the length you are expecting, it can include upto 3 padding bytes.

Related

Unable to access correct global label data of assembly from C in linux

I have an assembly code (hello1.s) where global label A_Td is defined and I want to access all the long data values defined with global label A_Td from/inside C program.
.file "hello1.s"
.globl A_Td
.text
.align 64
A_Td:
.long 1353184337,1353184337
.long 1399144830,1399144830
.long 3282310938,3282310938
.long 2522752826,2522752826
.long 3412831035,3412831035
.long 4047871263,4047871263
.long 2874735276,2874735276
.long 2466505547,2466505547
As A_Td is defined in text section, so it is placed in code section and only one copy is loaded into memory.
Using yasm , I have generated hello1.o file
yasm -p gas -f elf32 hello1.s
Now, to access all the long data using global label A_Td , I have written following C code (test_glob.c) taking clue from here global label.
//test_glob.c
extern A_Td ;
int main()
{
long *p;
int i;
p=(long *)(&A_Td);
for(i=0;i<16;i++)
{
printf("p+%d %p %ld\n",i, p+i,*(p+i));
}
return 0;
}
Using following command I have compiled C program and then run the C code.
gcc hello1.o test_glob.c
./a.out
I am getting following output
p+0 0x8048400 1353184337
p+1 0x8048404 1353184337
p+2 0x8048408 1399144830
p+3 0x804840c 1399144830 -----> correct till this place
p+4 0x8048410 -1012656358 -----> incorrect value retrieved from this place
p+5 0x8048414 -1012656358
p+6 0x8048418 -1772214470
p+7 0x804841c -1772214470
p+8 0x8048420 -882136261
p+9 0x8048424 -882136261
p+10 0x8048428 -247096033
p+11 0x804842c -247096033
p+12 0x8048430 -1420232020
p+13 0x8048434 -1420232020
p+14 0x8048438 -1828461749
p+15 0x804843c -1828461749
ONLY first 4 long values are correctly accessed from C program. Why this is happening ?
What needs to be done inside C program to access the rest of data correctly ?
I am using Linux. Any help to resolve this issue or any link will be a great help. Thanks in advance.
How many bytes does "long" have in this system?
It seems to me that printf interprets the numbers as four byte signed integers, where the value 3282310938 has the hex value C3A4171A, which is above 7FFFFFFF (in decimal: 2147483647) which is the largest four byte positive signed number, and hence a negative value -1012656358.
I assume that the assembler just interprets these four byte numbers as unsigned.
If you would use %lu instead of %ld, printf would interpret the numbers as unsigned, and should show what you expected.

Is it possible to force GCC to pad string constants in .rodata

I'm working on porting some code to an environment with more strict alignment requirements than x86 has, but I'm changing/testing on an x86 Linux machine for the time being due to this being easier for hardware access reasons, among other things.
I've distilled the first problem that I've run into into the following concise example:
#include <stdio.h>
#include <string.h>
#define BUFFER_SIZE 1024
#define DMQUOTE_LOG "DMQUOTELOG"
void aFunction (const char *configPath)
{
char LogFilename[BUFFER_SIZE] __attribute ((aligned));
// printf ("A\n");
strcpy (LogFilename, configPath);
strcat (LogFilename, DMQUOTE_LOG);
printf ("Log: %s\n", LogFilename);
}
int main (int argc, char **argv)
{
__asm__("pushf\n"
"orl $0x40000, (%esp)\n"
"popf");
aFunction ("");
return 0;
}
Running this code as is provides the expected output. However, uncommenting the other printf causes a bus error to trigger on the strcat line.
It looks to me as if the reason for this is that by introducing a second string constant, the constant from the define is shifted so that it's not aligned. This is upheld by noticing that if the string constant is changed from "A\n" to "AAA\n", everything works again (and magically gcc replaces the call to printf with a call to puts and drops the \n from the constant).
Is there some nice way to make gcc insert extra padding between all of the string constants that it's inserting into the .rodata section so that things align properly?
[EDIT]
As mentioned by fucanchik below, here's what the .rodata section of the above is (with the extra printf enabled):
.file "sample.c"
.section .rodata
.LC0:
.string "A"
.LC1:
.string "DMQUOTELOG"
.LC2:
.string "Log: %s\n"
.text
.globl aFunction
...
There is no alignment forced, which makes sense because I'm compiling under x86, which doesn't strictly require it. Naturally, modifying the assembler to this has the desired effect. However, I can't see a way to get gcc to apply this on its own on the fly. This may of course be moot if glibc itself can't handle being run in this mode in the general case, though.
.file "sample.c"
.section .rodata
.LC0:
.string "A"
.align 4,0
.LC1:
.string "DMQUOTELOG"
.LC2:
.string "Log: %s\n"
.text
.globl aFunction
...
There does not seem to be any way to accomplish this, at least with GCC. Testing seems to indicate that although the compiler will align integers, doubles and so on,because string constants are made of characters and alignment for character data is on byte boundaries, the compiler feels no need to align them.
The particulars of this bus error seem to indicate that glibc uses optimized routines that copy data words at a time without checking for alignment first (having not looked at the source, I don't know if this is true or not however).
This led me to investigating musl, an alternative libc implementation that is simple to install and use on a project by project basis.The C source code of the musl version of strcat takes care to copy unaligned bytes before copying words at a time, and thus this particular issue goes away, although naturally others remain.

String literal in C program, can it be found in binary?

For example:
int main(int argc, char *argv[]) {
if (strncmp(argv[1], "password")) {
printf("Incorrect password\n");
}
return 0;
}
Can I disassemble the binary for this compiled program and see the string "password" somewhere in the binary or is it only visible during run-time?
Typically, yes. Moreover, you don't need to "disassemble" anything. Most of the time you will be able to see it right in the compiled binary by opening it in a text or hex editor.
ASCII strings do not undergo any special encoding/decoding, so they appear literally in the binary and will appear when the binary is interpreted as a (mostly garbage-y-looking) ASCII file. If you think about it more deeply, the only systematic alternative to storing them in the binary would be some horrible OS-wide central registry of all strings for all programs. If they were stored in a separate file they could get separated from the binary.
However, the OP seems to beg a larger question about code layout and just what compilation does with read-only data such as strings. A more educational way to 'find' the string is to see the intermediate compilation stage of human-readable assembly, where the string will be laid out and referenced by a label. The linker (next compilation stage) will then resolve the label to a numeric offset from the beginning of the binary. Note the .rodata ("read-only data section") label below.
From the gcc manpage:
-S Stop after the stage of compilation proper; do not assemble. The output is in the form of an assembler code file for each non-assembler input file specified.
Results:
.file "foo.c"
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "password"
.LC1:
.string "Incorrect password"
.section .text.startup,"ax",#progbits
.p2align 4,,15
.globl main
.type main, #function
main:
[assembly language instructions follow]

Adding section to GNU linker script

Hi I am trying to define a custom section in my linker script in a following way:
.version_section(__custom_data__) :
{
KEEP (*version_info.o (.rodata* .data* .sdata*))
}
I am compiling a C file that contains a structure and I want to that structure be stored in this version_section all time.
version_info ver_info __attribute__ ((section(".version_section"))) = {7, 10, 2013, 17, 17, "some_type", "some_sw_version", "some_version"} ;
Now, till this stage everything works fine. But the so generated section has flags "AW" however I need flags to be "A".
So I am using an assembler file that defined this section to have "A" flag like this:
.section .version_section,"a", #progbits
.align 8
.globl __custom_data__
.type __custom_data__, #function
__custom_data__:
.word 0
.size __custom_data__, .-__custom_data__
.space (0x1024-0x4), 0
But I still see the the default flags to the version_section, ie. AW in readelf
[11] .version_section PROGBITS 00011088 004088 001044 00 WA 0 0 8
What am I doing wrong here?
It appears that "W" meant writable in readelf output, as I suspected. Adding the const qualifier to the definition of ver_info moved it to the desired segment in memory.

Local and static variables in C (cont'd)

Building on my last question i'm trying to figure out how .local and .comm directives work exactly and in particular how they affect linkage and duration in C.
So I've run the following experiment:
static int value;
which produces the following assembly code (using gcc):
.local value
.comm value,4,4
When initialized to zero yields the same assembly code (using gcc):
.local value
.comm value,4,4
This sounds logical because in both cases i would expect that the variable will be stored in the bss segment. Moreover, after investigating using ld --verbose it looks that all .comm variables are indeed placed in the bss segment:
.bss :
{
*(.dynbss)
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
// ...
}
When i initialize however my variable to a value other than zero, the compiler defines the variable in the data segment as i would expected, but produces the following output:
.data
.align 4
.type value, #object
.size value, 4
value:
.long 1
Besides the different segments (bss and data respectively) which thanks to your help previously i now understand, my variable has been defined as .local and .comm in the first example but not in the second. Could anyone explain that difference between the two outputs produced from each case?
The .local directive marks a symbol as a local, non-externally-visible symbol, and creates it if it doesn't already exist. It's necessary for 0-initialized local symbols, because .comm declares but does not define symbols. For the 1-initialized variant, the symbol itself (value:) declares the symbol.
Using .local and .comm is essentially a bit of a hack (or at least a shorthand); the alternative would be to place the symbol into .bss explicitly:
.bss
.align 4
.type value, #object
.size value, 4
value:
.zero 4
Linux kernel zeros the virtual memory of a process after allocation due to security reasons. So, the compiler already knows that the memory will be filled with zeros and does an optimization: if some variable is initialized to 0, there's no need to keep space for it in a executable file (.data section actually takes some space in ELF executable, whereas .bss section stores only its length assuming that its initial contents will be zeros).

Resources