c statements to assembly - c

for a assigment i need to translate some C declarations to assembly using only AVR directives.
I'm wondering if anyone could give me some advice on learning how to do this.
For example:
translate 'char c;' and 'char* d;' to assembly statements
Please note, this is the first week im learning assembly,
Any help/advice would be appreciated

First, char c; and char* d; are declarations not statements.
What you can do is dump the assembly output of your C program with the avr-gcc option -S:
# Dump assembly to stdout
avr-gcc -mmcu=your_avr_mcu -S -c source.c -o -
Then you can reuse the relevant assembly output parts to create inline assembler statements.
Look here on how to write inline assember with avr-gcc:
http://www.nongnu.org/avr-libc/user-manual/inline_asm.html

Without a compiler that you can disassemble from (avr-gcc is easy to come by), it may be difficult to try to understand what happens when a high level language is compiled.
you are simply declaring that you want a variable or an address when you use declarations like that. that doesnt necessarily, immediately, place something in assembly. often dead code and other things are removed from code by the compiler. Other times it is not until the very end of the compile process that you know where your variable may end up. Sometimes a char only ever lives in a register, so for a short period of time in the program that variable has a home. Sometimes there is a longer life the variable has to have a home the whole time the program is running and there are not enough registers to keep it in one forever so it will get a memory location allocated to it.
Likewise a pointer is an address which also lives in registers or in memory locations.
If you dont have a compiler where you can experiment with adding C code and seeing what happens. And even if you do you need to get the instruction set documentation for the desired processor family.
http://www.atmel.com/Images/doc0856.pdf
Look at the add operation for example add rd,rr, and it shows you that d and r are both between 0 and 31 so you could have add r10,r23. And looking at the operation that means that r10 = r10 + r23. If you had char variables that you wanted to add together in C this is one of the instructions the compiler might use.
There are two lds instructions a single 16 bit word version and one that takes two 16 bit words. (usually the assembler chooses that for you). rd is between 0 and 31 and k is an address in the memory space. If you have a global variable declared it will likely be accessed using lds or sts. So K is a pointer, a fixed pointer. Your char * in C can turn into a fixed number at compile time depending on what you do with that pointer in your code. If it is a dynamic thing then look at the flavors of ld and st, that use register pairs. So you might see a char * pointer turn into a pair of registers or a pair of memory locations that hold the pointer itself then it might use the x, y, or z register pairs, see ld and st, and maybe an adiw to find an offset to that pointer before using ld or st.
I have a simulator http://github.com/dwelch67/avriss. but it needs work, not fully debugged (unless you want to learn the instruction set through examining a simulator and its potentinal bugs). simavr and some others are out there that you can use to watch your code execute. http://gitorious.org/simavr

Related

When initializing variables in C does the compiler store the intial value in the executable

When Initializing variables in C I was wondering does the compiler make it so that when the code is loaded at run time the value is already set OR does it have to make an explicit assembly call to set it to an initial value?
The first one would be slightly more efficient because you're not making a second CPU call just to set the value.
e.g.
void foo() {
int c = 1234;
}
A compiler is not required to do either of them. As long as the behavior of the program stays the same it can pretty much do whatever it wants.
Especially when using optimization, crazy stuff can happen. Looking at the assembly code after heavy optimization can be confusing to say the least.
In your example, both the constant 1234 and the variable c would be optimized away since they are not used.
If it's a variable with static lifetime, it'll typically become part of the executable's static image, which'll get memcpy'ed, along with other statically known data, into the process's allocated memory when the process is started/loaded.
void take_ptr(int*);
void static_lifetime_var(void)
{
static int c = 1234;
take_ptr(&c);
}
x86-64 assembly from gcc -Os:
static_lifetime_var:
mov edi, OFFSET FLAT:c.1910
jmp take_ptr
c.1910:
.long 1234
If it's unused, it'll typically vanish:
void unused(void)
{
int c = 1234;
}
x86-64 assembly from gcc -Os:
unused:
ret
If it is used, it may not be necessary to put it into the function's frame (its local variables on the stack)—it might be possible to directly embed it into an assembly instruction, or "use it as an immediate":
void take_int(int);
void used_as_an_immediate(int d)
{
int c = 1234;
take_int(c*d);
}
x86-64 assembly from gcc -Os:
used_as_an_immediate:
imul edi, edi, 1234
jmp take_int
If it is used as a true local, it'll need to be loaded into stack-allocated space:
void take_ptr(int*);
void used(int d)
{
int c = 1234;
take_ptr(&c);
}
x86-64 assembly from gcc -Os:
used:
sub rsp, 24
lea rdi, [rsp+12]
mov DWORD PTR [rsp+12], 1234
call take_ptr
add rsp, 24
ret
When pondering these things Compiler Explorer along with some basic knowledge of assembly are your friends.
TL;DR: Your examples declares and initializes an automatic variable. It has to be initialized each time the function is called. So there will be some instruction to do this.
As an adjusted duplicate of my answer to How compile time initialization of variables works internally in c?:
The standard defines no exact way of initialization. It depends on the environment your code is developed and run on.
How variables are initialized depends also on their storage duration. You didn't mention it in the text, your example is an automatic variable. (Which is most probably optimized away, as commenters point out.)
Initialized automatic variables will be written each time their declaration is reached. The compiled program executes some machine code for this to happen.
Static variables are always intialized and only once before the program startup.
Examples from the real world:
Most (if not all) PC systems store the initial values of explicitly (and not zero-) initialized static variables in a special section called data that is loaded by the system's loader to RAM. That way those variables get their values before the program startup. Static variables not explicitly initialized or with zero-like values are placed in a section bss and are filled with zeroes by the startup code before program startup.
Many embedded systems have their program in non-volatile memory that can't be changed. On such systems the startup code copies the initial values of the section data into its allocated space in RAM, producing a similar result. The same startup code zeroes also the section bss.
Note 1: The sections don't have to be named liked this. But it is common.
This startup code might be part of the compiled program, or might not. It depends, see above. But speaking of efficience it doesn't matter which program initializes variables. It just has to be done.
Note 2: There are more kinds of storage duration, please see chapter 6.2.4 of the standard.
As long as the standard is met, a system is free to implement any other kind of initialization, including writing the initial values into their variables step by step.
Firstly, its important to have a common understanding of the word 'compiler', else we can end-up arguing endlessly.
In simple words,
a compiler is a computer program that translates computer code written
in one programming language (the source language) into another
programming language (the target language). The name compiler is
primarily used for programs that translate source code from a
high-level programming language to a lower level language (e.g.,
assembly language, object code, or machine code) to create an
executable program.
(Ref: Wikipedia)
With this common understanding, let's now find answer you question: The answer is 'yes, the final code contains explicit assembly call to set it to an initial value' for any kind of variables. It is so because finally the variables are either stored in some memory location, or they live in some CPU register in case the number of variables are so less that the variable can be accommodated into some CPU registers such as your code snippet running on lets say most modern servers (Side note: different systems have different number of registers).
For the variables that are stored in registers, there has to be a mov (or equivalent) kind of instruction to load that initial value into the register. Without such an instruction the assigned register cannot be assigned with the intended value.
For the variables that are stored in memory, depending on the architecture and the compiler efficiency, the init value has to be somehow pushed to the given/assigned address, which takes at least a couple of asm instructions.
Does this answer your question?

C Function to Step through its own Assembly

I am wondering if there is a way to write a function in a C program that walks through another function to locate an address that an instruction is called.
For example, I want to find the address that the ret instruction is used in the main function.
My first thoughts are to make a while loop that begins at "&main()" and then looping each time increments the address by 1 until the instruction is "ret" at the current address and returning the address.
It is certainly possible to write a program that disassembles machine code. (Obviously, this is architecture-specific. A program like this works only for the architectures it is designed for.) And such a program could take the address of its main routine and examine it. (In some C implementations, a pointer to a function is not actually the address of the code of the function. However, a program designed to disassemble code would take this into an account.)
This would be a task of considerable difficulty for a novice.
Your program would not increment the address by one byte between instructions. Many architectures have a fixed instruction size of four bytes, although other sizes are possible. The x86-64 architecture (known by various names) has variable instruction sizes. Disassembling it is fairly complicated. As part of the process of disassembling an instruction, you have to figure out how big it is, so you know where the next instruction is.
In general, though, it is not always feasible to determine which return instruction is the one executed by main when it is done. Although functions are often written in a straightforward way, they may jump around. A function may have multiple return statements. Its code may be in multiple non-contiguous places, and it might even share code with other functions. (I do not know if this is common practice in common compilers, but it could be.) And, of course main might not ever return (and, if the compiler detects this, it might not bother writing a return instruction at all).
(Incidentally, there is a mathematical proof that it is impossible to write a program that always determines whether a program terminates or not. This is called the Halting Problem.)

How do 'C' statements execute inside memory

Suppose I have this piece of C code here:
int main() {
int a = 10;
printf("test1");
printf("test2");
someFunc();
a = 20;
printf("%d",a);
}
Well I think that all these statements are stored on the stack at a time and then popped one by one to get executed. Am I correct? If not, please correct me.
Not really no. The C standard doesn't mention a stack so your notions are ill-conceived.
A C compiler (or interpreter for that matter) can do anything it likes so long as it follows the C standard.
In your case, that could mean, among other things, (i) the removal of a altogether, since it's only used to output 20 at the end of the function, and (ii) someFunc() could be removed if there are no side effects in doing so.
What normally happens is that your code is converted to machine code suitable for the target architecture. These machine code instructions do tend to follow the code quite faithfully (C in this sense is quite "low level") although modern compilers will optimise aggressively.
The C standard doesn't specify where things go in memory.
However, computers work like this in general:
Your program consists of various memory sections/segments, usually they are called .data, .bss, .rodata, .stack, .text and there is possibly also a .heap. The .text section is for storing the actual program code, the rest of the sections are for storing variables. More info on Wikipedia.
The C code is translated to machine code (assembler) by the compiler. All code is stored inside the .text section, which is read-only memory.
Modern computers can "mark" memory as either code or data, and will be capable of generating hardware exceptions if you try to execute code in the data section, or treat the code section as data. This way the processor can assist in catching bugs like dangling pointers or run-away code.
In theory you could execute code from any memory section, even on the stack, but because of the previously mentioned feature, this is not typically done. Most often, it wouldn't make any sense to do so anyhow.
So for your specific code snippet, the only thing that is stored on the stack is the variable a. Or more likely, it is stored in a CPU register, for performance reasons.
The string literals "test1", "test2" and "%d" will be stored in the .rodata section.
The literal 20 could either be stored in the .rodata section, or more likely, merged into the code and therefore stored in .text together with the rest of the code.
The program counter determines which part of the code that is currently executed. The stack is not involved in that what-so-ever, it is only for storing data.
Whats worth noting is that the C standard does not mandate implementation. So as long as the output is correct according to the standard, the compiler is free to implement this as it choses.
What most likely happens for you is that the compiler translates this bit of C into assembler code which is then executed top to bottom.
a = 20;
Is most likely optimised out unless you use it somewhere else. Good compilers also throw a warning for you like:
Warning: Unused Variable a

Assigning (const char *) to function pointer executing a hex code

I found a C code that looks like this:
#include <stdio.h>
char code[] =
"\x31\xd2\xb2\x30\x64\x8b\x12\x8b\x52\x0c\x8b\x52\x1c\x8b\x42"
"\x08\x8b\x72\x20\x8b\x12\x80\x7e\x0c\x33\x75\xf2\x89\xc7\x03"
"\x78\x3c\x8b\x57\x78\x01\xc2\x8b\x7a\x20\x01\xc7\x31\xed\x8b"
"\x34\xaf\x01\xc6\x45\x81\x3e\x46\x61\x74\x61\x75\xf2\x81\x7e"
"\x08\x45\x78\x69\x74\x75\xe9\x8b\x7a\x24\x01\xc7\x66\x8b\x2c"
"\x6f\x8b\x7a\x1c\x01\xc7\x8b\x7c\xaf\xfc\x01\xc7\x68\x72\x6c"
"\x64\x01\x68\x6c\x6f\x57\x6f\x68\x20\x48\x65\x6c\x89\xe1\xfe"
"\x49\x0b\x31\xc0\x51\x50\xff\xd7";
int main(void)
{
int (*func)();
func = (int(*)()) code;
(int)(*func)();
return 0;
}
For the given HEX CODE this program runs well and printing ("HelloWorld"). I was thinking that the HEX CODE is some machine instructions and by calling a function pointer that's pointing to that CODE we are executing that CODE.
Was my thought right? is there something to improve it?
How this HEX CODE gets generated?
Tanks for advance.
You are correct that by forcing a function pointer like this you are calling into machine instructions written as a hexadecimal string variable.
I doubt that a program like this would work on any CPU since about 2005.
On most RISC CPUs (like ARM) and on all Intel and AMD CPUs that support 64-bit, memory pages have a No Execute bit. Or in reverse an Execute bit.
On memory pages that do not have an Execute bit, the CPU will not run code. Compilers do not put variables into executable memory pages.
In order to run injected shell codes, attackers now have to use "return into libc" or function pointer overwrite attacks which set things up to call mprotect or VirtualProtect to set the execute bit on their shell code. Either that or get it injected into a executable space such as the Java, .NET, or Javascript JIT compiler uses.
Security hardened kernels will deny the ability to call mprotect. Once the program's address space is set by the dynamic library loader, it sets a security flag and no new executable pages can be created.
In order to make it always work you could assign some executable_readwrite space with malloc or the like and put the code in there and then execute it. Then there won't be any access violation faults.
void main(int argc, char** argv)
{
void* PointerToNewMemoryRegion=0;
void (*FunctionPointer) ();
PointerToNewMemoryRegion=VirtualAlloc(RandomPointer,113,MEM_COMMIT | MEM_RESERVE,PAGE_EXECUTE_READWRITE);
if (PointerToNewMemoryRegion == NULL)
{
std::cout<<"Failed to Allocate Memory region Error code: "<<GetLastError();
return;
}
memcpy(PointerToNewMemoryRegion, code,113);
FunctionPointer = (void(*)()) PointerToNewMemoryRegion;
(void)(*FunctionPointer) ();
VirtualFree(PointerToNewMemoryRegion,113,MEM_DECOMMIT)
}
but the code never returns to my code to execute so my last line is pointless. So my code has a memory leak.
To ask this question from a "general C" point of view isn't all that meaningful.
First of all, your code has many major problems:
The literal "\xFF\xFF\xFF" equals 0xFFFFFF00, not 0x00FFFFFF as may or may not have been the intention.
What this hex code means and if it is at all meaningful, is endian-dependent and also depends on the address bus width of the given CPU.
As others have mentioned, casts between function pointers and regular pointers isn't supported or well-defined by C, the C standard lists it as a "common extension".
That being said, code like this has about one single purpose, and that is various forms of boot loaders and self-updating software used in embedded systems.
Suppose for example that you have a boot loader program that is tasked with re-programming something in the very same segment of flash memory where said program itself is executed from. That is impossible because of the way the memory hardware works. So in order to do so, you would have to execute the actual flash programming routine from RAM. Since the array of hex gibberish is stored in RAM, the program can execute from there with the function pointer trick, assuming that the C compiler has a non-standard extension that allows the cast.
As for how to generate the code, you either write it all in assembler and then translate the assembler instructions to op codes manually (very tedious). Or more likely, you write the function in C and then disassemble it and copy/paste the op codes from the disassembly.
The latter is more dangerous though, as the critical part of getting code like this to work is calling convention: you must be absolutely sure that the function stacks/unstacks things properly when it is called and when it is done, restoring the contents of any CPU registers used etc. Which may force you to write part of the function in assembler anyhow. Needless to say, the code will be completely non-portable.

Initialise Stack pointer on ATtiny2313

I am programming an ATtiny2313 using avrdude and a makefile. I believe the stack pointer is not properly initialised, since when I call a function, the program appears to freeze. I found the following assembly code:
.include "tn2313def.inc"
ldi r16, low(RAMEND) ; Main program start
out SPL,r16 ;Set Stack Pointer to top of RAM
which I think might work, but I don't know how I can incorporate it into the c code that I created. ie. do I need to include a special header file or somehow denote that it is assembly and not c. I am relatively new to programming and I would appreciate any help either as to how to implement this code properly or another way of making my current c code initialise a stack pointer.
Thank you in advance.
Stephen
It really depends on how you've got your makefile configured as to whether the stack pointer will be initialised. If you're using gcc and the normal compile and link options, the linker ensures that some startup code crtX.o is also included in your executable. The linker automatically chooses the correct crtX.o file for your processor and compile options.
Amongst other things, the code in the crtX.o files will clear the bss segment to be all zeros as required by the C standard, configure your stack pointer and provide interrupt vectors in the correct location for those which have not been overridden.
Remember that the ATTiny2313 only has 128 bytes of SRAM. This area must be big enough for any initialised data you have in your program and the stack. Just the process of calling a simple function will use up quite a number of bytes of RAM to save the registers on the stack before calling the function.
So, I'd suggest to do these things:
Use the standard makefile if one is provided by your compiler, it will ensure that the standard startup code is included and that the stack/RAM is set up correctly before main() is called.
Turn on the linker map and symbol file output and verify that you actually have some space free that can be used for the stack.
The Atmel IDE has a reasonable simulator, so try running your code in the simulator. You'll be able to watch stack usage as you are calling the function and location any odd behaviour.
You may just happen to have a stack overflow (which is why you came to stackoverflow.com right?

Resources