I want to fetch and decode a instruction at address X. After that I want to increment the address by 4 and then execute the decoded instruction. The registers are 32 bit big endian. I am not asking for a solution, more a pointer or tips on how to do this in C, or if any of you know some good guides to follow.
You probably want assembly for this, not C. You could link assembly code into a C program, but you shouldn't write that in C.
Related
in advance: I do not want any 'ready-to-use' solution. Especially, imho it would defeat the purpose to learn something. And this is my primary goal: what I'd like to have is a few explainations/hints, or deeper understanding.
Now to the problem:
After using gdb and setting a breakpoint the following output of the stack is generated ( c-program):
The question that emerges now is:
0xbfa62f84:0x08048350 0xbfa62fe8 0xb7df0390 0x00000001
0xbfa62f94:0xbfa63014 0xbfa6301c 0xb7f262d0 0x00000000
for what do the values stand for? Or how can they be disassembled/decomposed?
I assume that they encode the memory address + some OP-code like mov, sub etc.
But how? and why? Or asked in a different fashion: how can these instructions be 'read out'?
Thanks in advance
Dan
If you want to understand the such a flow use a debugger like Keil .There at the same time you can see the assembly code and the generated hex file and your source code at the same time .Then when you step through the code you will understand how the assembly is related to the hex file and source code.
Machine code is not stored in the stack; however, the return address stored in the stack frame points to machine code. 0x08048350 is a good candidate for a code address (on x86, the code segment starts at a low address); you can examine the memory starting at that address and try to puzzle out opcodes and registers.
Or you could use the gdb command x/i to display the instructions starting at that address - x/16i 0x08048350 will display the first 16 instructions starting at that address.
I'm new to programming small/medium memory models CPUs. I am working with an embedded processor that has 256KB of flash code space contained in addresses 0x00000 to 0x3FFFF, and with 20KB of RAM contained in addresses 0xF0000 to 0xFFFFF. There are compiler options to choose between small, medium, or large memory models. I have medium selected. My question is, how does the compiler differentiate between a code/flash address and a RAM address?
Take for example I have a 1 byte variable at RAM address 10, and I have a const variable at the real address 10. I did something like:
value = *((unsigned char *)10);
How would the compiler choose between the real address 10 or the (virtual?) address 10. I suppose if I wanted to specify the value at real address 10 I would use:
value = *((const unsigned char *)10);
?
Also, can you explain the following code which I believe is related to the answer:
uint32_t var32; // 32 bit unsigned integer.
unsigned char *ptr; // 2 byte pointer.
ptr = (unsigned char *)5;
var32 = (uint32_t)ptr;
printf("%lu", var32)
The code prints 983045 (0xf0005 hex). It seems unrealistic, how can a 16 bit variable return a value greater than what 16 bits can store?
Read your compiler's documentation to find out details about each memory model.
It may have various sorts of pointer, e.g. char near * being 2-byte, and char far * being 4-byte. Alternatively (or as well as), it might have instructions for changing code pages which you'd have to manually invoke.
how can a 16 bit variable return a value greater than what 16 bits can store?
It can't. Your code converts the pointer to a 32-bit int. , and 0xF0005 can fit in a 32-bit int. Based on your description, I'd guess that char * is only pointing to the data area, and you would use a different sort of pointer to point to the code area.
I tried to comment on Matt's answer but my comment was too long, and I think it might be an answer, so here's my comment:
I think this is an answer, I'm really looking for more details though. I've read the manual but it doesn't have much information on the topic. You are right, the compiler has near/far keywords you can use to manually specify the address (type?). I guess the C compiler knows if a variable is a near or far pointer, and if it's a near pointer it generates instructions that map the 2 byte near pointer to a real address; and these generated mapping instructions are opaque to the C programmer. That would be my only guess. This is why the pointer returns a value greater than its 16 bit value; the compiler is mapping the address to an absolute address before it stores the value in var32. This is possible because 1) the RAM addresses begin at 0xF0000 and end at 0xFFFFF, so you can always map a near address to its absolute address by or'ing the address with 0xF0000, and 2) there is no overlap between a code (far) pointer and a near pointer or'd with 0xF0000. Can anyone confirm?
My first take would be read the documentation, however as I had seen, it was already done.
So my assumption would be that you somehow got to work for example on a large existing codebase which was developed with a not too widely supported compiler on a not too well known architecture.
In such a case (after all my attempts with acquiring proper documentation failed) my take would be generating assembler outputs for test programs, and analysing those. I did this a while ago, so it is not from thin air (it was a 8051 PL/M compiler running on an MDS-70, which was emulated by a DOS based emulator from the late 80s, for which DOS was emulated by DOSBox - yes, and for the huge codebase we needed to maintain we couldn't get around this mess).
So build simple programs which would do something with some pointers, compile those without optimizations to assembly (or request an assembly dump, whatever the compiler can do for you), and understand the output. Try to cover all pointer types and memory models you know of in your compiler. It will clarify what is happening, and hopefully the existing documentations will also help once you understand their gaps this way. Finally, don't stop at understanding just enough for the immediate problem, try to document the gaps properly, so later you won't need to redo the experiments to figure out things you once almost done.
To preface this, yes this is a project to take control of an executable externally. No, I do not have any malicious intents with this, the end result of this project won't be anything useful anyway. I am writing this in cygwin on a 32-bit installation of XP.
What I need to do is change the first few bits of a COM file to be a jump instruction so that on execution, it will jump to the very end of the COM file. I have looked in Assembler manuals to find what the bytes of that command would be so that I can just hard code it in C, but have had no luck.
First Question: Can I do this in C? It seems to me like I could just insert OpCodes in the beginning of any COM file so that it would execute that instead of the COM file.
Second Question: does someone know where I can find a resource for OpCodes so that I can insert them in my file? Or, does anyone know what the bytes would be for a Jump instruction?
If you have any question about the authenticity of this, feel free to ask.
The IntelĀ® 64 and IA-32 Architectures Software Developer Manual Volume 2A Instruction Set Reference explains the encoding of the JMP instruction (real mode is a subset of IA-32).
For a 16-byte near jump (within the current code segment) you'd use 0xE9 followed by the relative offset to jump to. If your jump is the first bytes of the COM file then the offset will be relative to address 0x103 - the first instruction of a COM file is always loaded at address 0x100, and the jump is relative to the instruction following the 3-byte jump.
On XP there should be debug.exe. Simply start it, start writing code with 'a'
type jmp ff00, and dis/[u]nassemble the result with 'u' if the corresponding hex dump was not shown.
Notice first that your program is necessarily operating system, ABI, and machine instruction set specific. (e.g. it won't run under Linux/x86-64 or Linux/PowerPC)
You could write in C the machine instructions as a sequence of bytes. Which bytes you have to write (i.e. the encoding of the appropriate jump instructions) is left to you!!!!!
Of course, that is not portable C. But you could basically do a memcpy with some appropriate source byte zone.
Maybe libraries like asmjit or GNU lightning might inspire you.
You probably cannot use them directly, but studying their code could help you.
See also x86 wikipedia pages for more references.
for a assigment i need to translate some C declarations to assembly using only AVR directives.
I'm wondering if anyone could give me some advice on learning how to do this.
For example:
translate 'char c;' and 'char* d;' to assembly statements
Please note, this is the first week im learning assembly,
Any help/advice would be appreciated
First, char c; and char* d; are declarations not statements.
What you can do is dump the assembly output of your C program with the avr-gcc option -S:
# Dump assembly to stdout
avr-gcc -mmcu=your_avr_mcu -S -c source.c -o -
Then you can reuse the relevant assembly output parts to create inline assembler statements.
Look here on how to write inline assember with avr-gcc:
http://www.nongnu.org/avr-libc/user-manual/inline_asm.html
Without a compiler that you can disassemble from (avr-gcc is easy to come by), it may be difficult to try to understand what happens when a high level language is compiled.
you are simply declaring that you want a variable or an address when you use declarations like that. that doesnt necessarily, immediately, place something in assembly. often dead code and other things are removed from code by the compiler. Other times it is not until the very end of the compile process that you know where your variable may end up. Sometimes a char only ever lives in a register, so for a short period of time in the program that variable has a home. Sometimes there is a longer life the variable has to have a home the whole time the program is running and there are not enough registers to keep it in one forever so it will get a memory location allocated to it.
Likewise a pointer is an address which also lives in registers or in memory locations.
If you dont have a compiler where you can experiment with adding C code and seeing what happens. And even if you do you need to get the instruction set documentation for the desired processor family.
http://www.atmel.com/Images/doc0856.pdf
Look at the add operation for example add rd,rr, and it shows you that d and r are both between 0 and 31 so you could have add r10,r23. And looking at the operation that means that r10 = r10 + r23. If you had char variables that you wanted to add together in C this is one of the instructions the compiler might use.
There are two lds instructions a single 16 bit word version and one that takes two 16 bit words. (usually the assembler chooses that for you). rd is between 0 and 31 and k is an address in the memory space. If you have a global variable declared it will likely be accessed using lds or sts. So K is a pointer, a fixed pointer. Your char * in C can turn into a fixed number at compile time depending on what you do with that pointer in your code. If it is a dynamic thing then look at the flavors of ld and st, that use register pairs. So you might see a char * pointer turn into a pair of registers or a pair of memory locations that hold the pointer itself then it might use the x, y, or z register pairs, see ld and st, and maybe an adiw to find an offset to that pointer before using ld or st.
I have a simulator http://github.com/dwelch67/avriss. but it needs work, not fully debugged (unless you want to learn the instruction set through examining a simulator and its potentinal bugs). simavr and some others are out there that you can use to watch your code execute. http://gitorious.org/simavr
I am working on this code where, I need to get the instructions executed by a program, given the instruction pointers. Assume for now that I have a mechanism that provides me addresses of the instructions, would it be possible to get the opcode from this (on an IA32 instruction set) ?
You need an in memory disassembler, such as BeaEngine or DiStorm, these can be passed a memory address to read from, just make sure the address is readable. If you know the length in bytes of the function, its a little better to use the Run-Length-Dissassemblers also provided on those sites.
If you are looking for hardware supported help, that's not how it works. This needs to be done in software. Your code needs a table of opcodes and instructions and just has to perform a lookup.
What you describe is known as disassembly. There are many open source disassemblers and if you could use one of those it would make your task very simple. Look here: http://en.wikibooks.org/wiki/X86_Disassembly/Disassemblers_and_Decompilers