How to realign sections of a x64 portable executable based on section alignment field - linker

I have been searching high and low for an easy way to manually align any x64 PE to 0x10000 section alignment and all i could find was pe file aligment or realignment.
As a test case i used the source code of notepad++ and compiled it with vs 2022 community.
The first time with a section alignment of 0x1000 the pagesize default aligment and the second time with 0x10000 section aligment and i was comparing the differences.
The vs linker didn't change the file(raw address, raw size) in both files
The only thing changed was the size of image,section aligment,base of code and the virtual address of the sections(in first file the .text section was starting at 0x1000 and in second at 0x10000).
Making a copy of the first file with 0x1000 alignment and manually changed all the fields to try to section align the file to 0x10000 resulted in an invalid pe.
When vs linker does it it does not.
How to compute the correct virtual addresses based on the 0x10000 section aligment
for the PE sections?

I have assembled and linked two simple PE files: t1.asm with standard section alignment 4 KiB
EUROASM CPU=X64, TimeStamp=0
t1 PROGRAM Format=PE, Width=64, Entry=Start, IconFile=, \
FileAlign=512, SectionAlign=0x1000
[.text]
Start: RET
[.data]
DB "PE"
ENDPROGRAM
and t2.asm with alignment increased to 64 KiB:
EUROASM CPU=X64, TimeStamp=0
t2 PROGRAM Format=PE, Width=64, Entry=Start, IconFile=, \
FileAlign=512, SectionAlign=0x10000
[.text]
Start: RET
[.data]
DB "PE"
ENDPROGRAM
assembled and linked them with EuroAssembler and compared the executables:
R:\>euroasm t1.asm, t2.asm, nowarn=0..999
R:\>dir t?.exe
Directory of R:\
07.07.2022 09:48 1 026 t1.exe
07.07.2022 09:48 1 026 t2.exe
R:\>fc t1.exe t2.exe
Comparing files t1.exe and T2.EXE
000000AD: 10 00
000000AE: 00 01
000000B1: 10 00
000000B2: 00 01
000000B9: 10 00
000000BA: 00 01
000000BD: 10 00
000000BE: 00 01
000000C9: 10 00
000000CA: 00 01
000000E1: 30 00
000000E2: 00 03
000001A1: 10 00
000001A2: 00 01
000001A5: 10 00
000001A6: 00 01
000001BE: D0 50
000001C9: 10 00
000001CA: 00 01
000001CD: 20 00
000001CE: 00 02
000001E6: D0 50
As expected, in t1.exe sections .text and .data start at RVA 0x1000 and 0x2000, while in t2.exe they start at 0x10000 and 0x20000.
According to MS PE32+ specification the differences at file offsets 000000AD..000000B2 is SizeOfCode and SizeOfInitializedData in virtual address space rounded up to section alignment.
Differences at B8..BF are AddressOfEntryPoint and BaseOfCode (RVA of the first byte in .text).
Difference at C8..CB coresponds with SectionAlignment defined in optional header.
Difference at E0..E3 coresponds with SizeOfImage rounded up to section alignment.
Following differences concern section headers.
1A0..1A7 describe VirtualSize and VirtualAddress (RVA) increased from 0x1000 to 0x10000. Difference at 1BE concerns section-header Characteristics, where the value 0x00D00000 specifies 4 KiB section alignment and 0x00500000 specifies 16 B alignment (64 KiB section alignment isn't available in MS specification, so €ASM reverts to default 16 B, nevertheless this characteristic is relevant in object files only).
The remaining five differeces are analogous for section .data.
As you can see, there are many places where you will need to manually change the contents of PE file to keep it valid.

Related

Which Clang/GCC linker flag should be used to produce offsets in code that stay within the binary range?

I'm trying to link my code with an external static library, that has this piece of code in the binary:
0000000000000000 <some_method>:
0: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 7 <some_method+0x7>
7: c3 retq
After linking with my code, the linker writes an actual offset instead of the zeroes:
00000000000175c0 <some_method>:
175c0: 48 8d 05 39 aa 20 00 lea 0x20aa39(%rip),%rax # 222000 <some_method.method>
175c7: c3 retq
Offset 222000 is supposed to be in the .data section according to the readelf output, which is supposed to be OK, but the problem is that I need to copy my binary code "as is" into some memory space and make it run from there, without using any OS loaders that know how to relocate different sections of the binary in the process address space. The memory address to which I load my binary can change too, so I can't use static non-relative offsets in my code either.
I want all my RIP-relative offsets in the code to be only within the binary file size range, so for example if my binary is 0x10000 bytes size, and I load it at address 0x200000, I don't want any offsets to go beyond the address 0x210000. Is there a way to tell the linker to do that somehow?

Calling an external program, downloaded in runtime and getting return data on bare metal

I have a system with an MPC8548 processor. My job is to send a second (external) program to the running firmware and execute it.
I have a cross compiler set up to build the base firmware and the external code.
I was able to send a separate program in runtime (I'm not sure if it's correct, but it seems to be working). Here's how I did it: in the original firmware I made room for the external code by inserting a lot of nop instructions starting at a specific memory address in the .text section. After that, in runtime, I copied the received external code (the compiled code using gcc, the .o file) to that memory address, using memcpy. Then in the original firmware I made a function pointer, pointing to that memory location and called it.
This is the external, simple code:
int external_main(void)
{
int x = 11;
return x;
}
And this is how I called it in the original firmware:
int (*extFuncPtr)(void);
extFuncPtr = &external_start;
int x = extFuncPtr();
external_start is specified in the linker scrip's .text section (it is the starting address of the nop instructions), and in the code declared as extern unsigned char external_start; I've printed out that memory region and the external code is there and it runs successfully (if I download some garbage there, it crashes).
My problem is, that I cannot get any return data from the external code. The value of x will be garbage. I tried passing an int* as function parameter, but it's still empty. I also tried writing something on a fix address (in the external code), and reading the same address in the firmware, but that didn't work either.
My question is that is it possible to get any data from the external code? I'm probably doing something wrong, or missing something.
Are there other possibilities to solve this problem?
EDIT 1
This is the generated assembly of the external code (calling objdump):
00000000 <external_main>:
0: 94 21 ff e0 stwu r1,-32(r1)
4: 93 e1 00 1c stw r31,28(r1)
8: 7c 3f 0b 78 mr r31,r1
c: 39 20 00 0b li r9,11
10: 91 3f 00 08 stw r9,8(r31)
14: 81 3f 00 08 lwz r9,8(r31)
18: 7d 23 4b 78 mr r3,r9
1c: 39 7f 00 20 addi r11,r31,32
20: 83 eb ff fc lwz r31,-4(r11)
24: 7d 61 5b 78 mr r1,r11
28: 4e 80 00 20 blr
If I put that external_main function in the firmware, it generates the same instructions. And when I call it, it look like this:
6dbdc: 4b ff 74 b5 bl 65090 <external_main>
6dbe0: 90 7f 00 3c stw r3,60(r31)
And this is the generated instructions of the function pointer part:
6dbb8: 3d 20 00 15 lis r9,21
6dbbc: 39 29 18 ec addi r9,r9,6380
6dbc0: 91 3f 00 34 stw r9,52(r31)
6dbc4: 81 3f 00 34 lwz r9,52(r31)
6dbc8: 7d 29 03 a6 mtctr r9
6dbcc: 4e 80 04 21 bctrl
6dbd0: 90 7f 00 38 stw r3,56(r31)
EDIT 2
So I've tried a lot of things to see what the problem might be, including downloading the whole main firmware as the external program (aka injecting the firmware into the firmware in runtime). I called a function which returns a number and it worked. So I started clearing out the secondary firmware (throwing out everything unnecessary), constantly checking if it is still working or not. And it worked until the binary size became less than 16 Kilobytes.
When the external program's size is less then 16 KB, it just doesn't work (can't get any return values), but when it's 16 or higher, it works. It returns values, also if I pass a pointer to an array as input parameter (allocated in the base firmware) and populate it in the external code, when it returns the array will be there, populated.
Don't know why this happens, but it works. The external code won't be smaller than 16 KB anyway, so, it's a win in my book. If anyone has some idea why it behaves like this, I'll be glad to hear that.

Array in Hexadecimal in Assembly x86 MASM

If:
(I believe the registers are adjacent to one another...)
A BYTE 0xB, 0d20, 0d10, 0d13, 0x0C
B WORD 0d30, 0d40, 0d70, 0hB
D DWORD 0xB0, 0x200, 0x310, 0x400, 0x500, 0x600
Then:
What is [A+2]? The answer is 0d20 or 0x15
What is [B+2]? The answer is 40 or 0x28
What is [D+4]? Not sure
What is [D-10]? Not sure
I think those are the answers but I am not sure. Since a WORD is 1 BYTE, AND DWORD is 2 WORDS, then as a result when you are counting the array of [B+2] for example, you should be starting at 0d30, then 0d40 (count two WORD). And [A+2] is 0d20 because you are counting two bytes. What am I doing wrong? Please help. Thank you
EDIT
So is it because: Taking into account that the first value of A,B, and D are offsets x86 is little endian... A = 0d10, count 2 more from there B...bytes (in decimal) = 30,0,40,0,70,0,11,0 B is 0d40, count 2 more bytes from that D...bytes (in hex) = 0x200, 0,0,0,...0,2,0,0,...0x10,3,0,0,...0,4,0,0,...0,5,0,0,...0,6,‌​0,0 D is 0x200. Count 4 bytes from there. Count 10 bytes backwards from 0xb0. So wouldn't [D-10] be equal to 0x0C? Thank you
Also if I did [B-3], would it be 0d13? I was told it actually is between 0d10 and 0d13 such that it will be 0A0D and due to little endian will be 0D0A. Is that correct? Thank you!!
EDIT
WORD are 2 BYTEs. DWORD are two WORDs ("D" stands for "double"). QWORD is 4*WORD (Quad).
Memory is addressed in bytes, ie. content of memory can be viewed as (for three bytes with values: 0xB, 20, 10):
address | value
----------------
0000 | 0B
0001 | 14
0002 | 0A
WORD then occupies two bytes in memory, on x86 the least significant byte goes at lower address, most significant is at higher address.
So WORD 0x1234 is stored in memory at address 0xDEAD as:
address | value
----------------
DEAD | 34
DEAE | 12
Registers on x86 are special tiny bit of memory located directly on CPU itself, which is not addressable by the numerical addresses like above, but only by the instruction opcode containing the number of register (in source their are named ax, bx, ...).
That means you have no registers in your question, and it makes no sense to talk about registers in it.
In normal assembler [B+2] would be BYTE 40, (bytes at B are: 30, 0, 40, 0, 70, 0, 11, 0). In MASM it may be different, as it's trying to work with "variables" considering also their size, so [B+2] may be treated as WORD 70. I don't know for sure, and I don't want to know, MASM has too many quirks to be used logically, and you have to learn them. (just create short code with B WORD 0, 1, 2, 3, 4 MOV ax,[B+2] and check the disassembly in debugger).
[A+2] is 10. You are missing the point that [A] is [A+0]. Like in C/C++ arrays, indexing goes from 0, not from 1.
Rest of answers can be easily figured out, if you draw the bytes on the paper (for example DWORD 0x310 compiles to 10 03 00 00 hexa bytes).
I wonder where you got 0x15 in first possible answer, as I don't see any value 21 in A.
edit due to new comments ... I will "compile" it for you, make sure you either understand every byte, or ask under answer which one is not clear.
; A BYTE 0xB, 0d20, 0d10, 0d13, 0x0C
A:
0B 14 0A 0D 0C
; B WORD 0d30, 0d40, 0d70, 0hB
B: ;▼ ▼ ▼ ▼
1E 00 28 00 46 00 0B 00
; D DWORD 0xB0, 0x200, 0x310, 0x400, 0x500, 0x600
D: ;▼ ▼ ▼ ▼ ▼ ▼
B0 00 00 00 00 02 00 00 10 03 00 00 00 04 00 00 00 05 00 00 00 06 00 00
Notice how A, B and D are just labels marking some address in memory, that's how most Assemblers work with symbols. In MASM it's more tricky, as it tries to be "clever" and keeps not only the address around, but also it knows the D was defined as DWORD and not BYTE. That's not the case with different assemblers.
Now [D+4] in MASM is tricky, it will probably use the size knowledge to default to DWORD size of that expression (in other assemblers you should specify, like "DWORD PTR [D+4]", or it is deduced from target register size automatically, when possible). So [D+4] will fetch bytes 00 02 00 00 = DWORD 00000200. (I just hope MASM doesn't recalculate also the +4 offset as +4th dword, ie +16 in bytes).
Now to your comments, I will torn them apart into tiny bits with mistakes, as while often it's easy to understand what you did mean, in Assembly once you start writing code, it's not enough to have good intention, you must be exact and accurate, CPU will not fill any gap, and do exactly what you wrote.
Can you explain how did you get 0d13 of A and through to 0d30 of B #Jester?
Go to my "compiled" bytes, and D-1 (when offset are in bytes) means one byte back from D: address, ie. that 00 at the end of B line. Now for D-10 count 10 bytes back from D: ... That will go to 0D in A line, as 8 bytes are in B array, and remaining two are at end of A array.
Now if you read from that address 4 bytes: 0D 0C 1E 00 = DWORD 001E0C0D. (Jester mixed up decimal 13 into 13h by accident in his final "dword" value)
each value in B will occupy two "slots" as you count back? And each value in A will occupy four "slots"?
It's other way around, two values in B will form 1 DWORD slot, and four values in A will form 1 DWORD. Just as "D" data of 6 DWORD can be treated also as 12 WORD values, or 24 BYTE values. For example DWORD PTR [A+2] is 1E0C0D0A.
first value of A,B, and D are offsets x86 is little endian
"value of A" is actually some memory address, I think I automatically don't mention "value" in such case, but "address", "pointer" or "label" (although "value of symbol A" is valid English sentence, and can be resolved after symbols have addresses assigned).
OFFSET A has particular special meaning in MASM, taking the byte offset of address A since the start of it's segment (in 32b mode this is usually the "address" for human, as segments start from 0 and memory is flat-mapped. In real mode segment part of address was important, as offset was only 16 bit (only 64k of memory addressable through offset only)).
In your case I would say "value at A", as "content of memory at address A". It's subtle, I know, but when everyone talks like this, it's clear.
B is 0d40
[B+2] is 40. B+2 is some address+2. B is some address. It's the [x] brackets marking "value from memory at x".
Although in MASM it's a bit different, it will compile mov eax,D as mov eax,DWORD PTR [D] to mimic "variable" usage, but that's specific quirk of MASM. Avoid using that syntax, it hides memory usage from unfocused reader of source, use mov eax,[D] even in MASM (or get rid of MASM ideally).
D...bytes (in hex) = 0x200, 0,0,0,...
0x200 is not byte, hexa formatting has that neat feature, that two digits pair form single byte. So hexa 200 is 3 digits => one and half of byte.
Consider how those DWORD values were created from bytes.. in decimal formatting you would have to recalculate the whole value, so bytes 40,30,20,10 are 40 + 30*256 + 20*65536 + 10*16777216 = 169090600 -> the original values are not visible there. With hexa 28 1E 14 0A you just reassemble them in correct order 0A141E28.
D is 0x200.
No, D is address. And even [D] is 0xB0.
Count 10 bytes backwards from 0xb0. So wouldn't [D-10] be equal to 0x0C?
B0 is at D+0 address. You don't count it into those 10 bytes in [D-10], that B0 is zero bytes beyond D (D+0). Look at my "compiled" memory and count bytes there to get comfortable with offsets.

Why static string in .rodata section has a four dots prefix in GCC?

For the following code:
#include <stdio.h>
int main() {
printf("Hello World");
printf("Hello World1");
return 0;
}
the generated assembly for calling printf is as follows (64 bits):
400474: be 24 06 40 00 mov esi,0x400624
400479: bf 01 00 00 00 mov edi,0x1
40047e: 31 c0 xor eax,eax
400480: e8 db ff ff ff call 400460 <__printf_chk#plt>
400485: be 30 06 40 00 mov esi,0x400630
40048a: bf 01 00 00 00 mov edi,0x1
40048f: 31 c0 xor eax,eax
400491: e8 ca ff ff ff call 400460 <__printf_chk#plt>
And the .rodata section is as follows:
Contents of section .rodata:
400620 01000200 48656c6c 6f20576f 726c6400 ....Hello World.
400630 48656c6c 6f20576f 726c6431 00 Hello World1.
Based on the assembly code, the first call for printf has the argument with address 400624 which has a 4 byte offset from the start of .rodata. I know it skips the first 4 bytes for these 4 dots prefix here. But my question is why GCC/linker produce this prefix for string in .rodata ? I am using 4.8.4 GCC on Ubuntu 14.04. The compilation cmd is just: gcc -Ofast my-source.c -o my-program.
For starters, those are not four dots, the dot just means unprintable character. You can see in the hex dump that those bytes are 01 00 02 00.
The final program contains other object files added by the linker, which are part of the C runtime library. This data is used by code there.
You can see the address is 0x400620. You can then try to find a matching symbol, for example you can load it into gdb and use the info symbol command:
(gdb) info symbol 0x4005f8
_IO_stdin_used in section .rodata of /tmp/a.out
(Note I had a different address.)
Taking it further, you can actually find the source for this in glibc:
/* This records which stdio is linked against in the application. */
const int _IO_stdin_used = _G_IO_IO_FILE_VERSION;
and
#define _G_IO_IO_FILE_VERSION 0x20001
Which corresponds to the value you see if you account for little-endian storage.
It does not prefix the data. The .rodata can contain anything. The first four bytes are [seemingly] a string, but it just happens to link there (i.e. it's for something else). It is unrelated to your "Hello World"

Where do static local variables go

Where are static local variables stored in memory? Local variables can be accessed only inside the function in which they are declared.
Global static variables go into the .data segment.
If both the name of the static global and static local variable are same, how does the compiler distinguish them?
Static variables go into the same segment as global variables. The only thing that's different between the two is that the compiler "hides" all static variables from the linker: only the names of extern (global) variables get exposed. That is how compilers allow static variables with the same name to exist in different translation units. Names of static variables remain known during the compilation phase, but then their data is placed into the .data segment anonymously.
Static variable is almost similar to global variable and hence the uninitialized static variable is in BSS and the initialized static variable is in data segment.
As mentioned by dasblinken, GCC 4.8 puts local statics on the same place as globals.
More precisely:
static int i = 0 goes on .bss
static int i = 1 goes on .data
Let's analyze one Linux x86-64 ELF example to see it ourselves:
#include <stdio.h>
int f() {
static int i = 1;
i++;
return i;
}
int main() {
printf("%d\n", f());
printf("%d\n", f());
return 0;
}
To reach conclusions, we need to understand the relocation information. If you've never touched that, consider reading this post first.
Compile it:
gcc -ggdb -c main.c
Decompile the code with:
objdump -S main.o
f contains:
int f() {
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
static int i = 1;
i++;
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <f+0xa>
a: 83 c0 01 add $0x1,%eax
d: 89 05 00 00 00 00 mov %eax,0x0(%rip) # 13 <f+0x13>
return i;
13: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 19 <f+0x19>
}
19: 5d pop %rbp
1a: c3 retq
Which does 3 accesses to i:
4 moves to the eax to prepare for the increment
d moves the incremented value back to memory
13 moves i to the eax for the return value. It is obviously unnecessary since eax already contains it, and -O3 is able to remove that.
So let's focus just on 4:
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <f+0xa>
Let's look at the relocation data:
readelf -r main.o
which says how the text section addresses will be modified by the linker when it is making the executable.
It contains:
Relocation section '.rela.text' at offset 0x660 contains 9 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000006 000300000002 R_X86_64_PC32 0000000000000000 .data - 4
We look at .rela.text and not the others because we are interested in relocations of .text.
Offset 6 falls right into the instruction that starts at byte 4:
4: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # a <f+0xa>
^^
This is offset 6
From our knowledge of x86-64 instruction encoding:
8b 05 is the mov part
00 00 00 00 is the address part, which starts at byte 6
AMD64 System V ABI Update tells us that R_X86_64_PC32 acts on 4 bytes (00 00 00 00) and calculates the address as:
S + A - P
which means:
S: the segment pointed to: .data
A: the Added: -4
P: the address of byte 6 when loaded
-P is needed because GCC used RIP relative addressing, so we must discount the position in .text
-4 is needed because RIP points to the following instruction at byte 0xA but P is byte 0x6, so we need to discount 4.
Conclusion: after linking it will point to the first byte of the .data segment.

Resources