How do I determine the start and end of instructions in an object file? - disassembly

So, I've been trying to write an emulator, or at least understand how stuff works. I have a decent grasp of assembly, particularly z80 and x86, but I've never really understood how an object file (or in my case, a .gb ROM file) indicates the start and end of an instruction.
I'm trying to parse out the opcode for each instruction, but it occurred to me that it's not like there's a line break after every instruction. So how does this happen? To me, it just looks like a bunch of bytes, with no way to tell the difference between an opcode and its operands.

For most CPUs - and I believe Z80 falls in this category - the length of an instruction is implicit.
That is, you must decode the instruction in order to figure out how long it is.

If you're writing an emulator you don't really ever need to be able to obtain a full disassembly. You know what the program counter is now, you know whether you're expecting a fresh opcode, an address, a CB page opcode or whatever and you just deal with it. What people end up writing, in effect, is usually a per-opcode recursive descent parser.
To get to a full disassembler, most people impute some mild simulation, recursively tracking flow. Instructions are found, data is then left by deduction.
Not so much on the GB where storage was plentiful (by comparison) and piracy had a physical barrier, but on other platforms it was reasonably common to save space or to effect disassembly-proof code by writing code where a branch into the middle of an opcode would create a multiplexed second stream of operations, or where the same thing might be achieved by suddenly reusing valid data as valid code. One of Orlando's 6502 efforts even re-used some of the loader text — regular ASCII — as decrypting code. That sort of stuff is very hard to crack because there's no simple assembly for it and a disassembler therefore usually won't be able to figure out what to do heuristically. Conversely, on a suitably accurate emulator such code should just work exactly as it did originally.

Related

Mapping inputs to an array (A better way?)

I'm working in an embedded system and have "mapped" some defines to an array for inputs.
volatile int INPUT_ARRAY[40];
#define INPUT01 INPUT_ARRAY[0]
#define INPUT02 INPUT_ARRAY[1]
// section 2
if ( INPUT01 && INPUT02 ) {
writepin(outputpin, value);
}
If I want to read from Input 1, I can simply say newvariable = INPUT01 or I can compare data with Input 1, like in section 2 of my code. I'm not sure if this is a normal way of mapping the name INPUT01 to where the array position is. Or for an Input pin in the first place. Each array value represents a binary pin, and are read into the array by decoding a port value (16 bit). Question: Is using the defines and array like this reasonably efficient?
Yes, your solution is efficient.
Before the C compiler even sees your code, the C preprocessor substitutes INPUT_ARRAY[0] for INPUT01 and, similarly, INPUT_ARRAY[1] for INPUT02; so this substitution uses zero time and zero power at run time.
Moreover, when the C compiler sees INPUT_ARRAY[1] in the preprocessed code, it adds 1 at compile time to the base address of INPUT_ARRAY. Therefore, you get maximal efficiency at run time.
Admittedly, were you manually to turn your C compiler's optimizer off, as with the -O0 option of GCC, then it is conceivable that the compiler would emit assembly code to add the 1 at run time. So don't do that.
The only likely exception to the foregoing would be the case that the base address of INPUT_ARRAY were unknown to the compiler at run time, not likely because INPUT_ARRAY were dynamically allocated on the heap (which would make little sense for hardware device addressing), but likely because the base address of INPUT_ARRAY were configurable during boot via device configuration registers. Some hardware does this, but if yours does, why, that is exactly the reason your MCU (or MPU) possesses an index-offset indirect addressing mode in the first place. Though this mode engages the MCU's integer arithmetic unit, [a] the mode does not multiply (multiplication being a power-hungry operation); and, [b] anyway, the mode is such a normal, often-used mode that MCUs are invariably designed to support it efficiently—not perhaps as efficiently as precomputed direct addressing, but as efficiently as one can reasonably expect for such a use. The MCU's manufacturer knows that device pins are things you need to address. The engineer who designed your MCU will have given priority to making the index-offset indirect mode as efficient as possible for this and other reasons. (You could maybe still cheat the matter to save a few millijoules via self-modifying code, if your MCU even allowed that; but, as an engineer, you'd regret the cheat, I suspect, unless security and maintainability were non-issues to you. The problem probably is not much of a real problem. Index-offset indirect addressing is the normal technique when the base address remains unknown until run time. If you really need to save that last millijoule, then you might not be using a C compiler for your code's inner loop, anyway, but might be handcrafting assembly code.)
I suspect that you would find it instructive to tell your compiler to emit assembly code for your inspection. I do not know which compiler you are using but, if you were using GCC, then gcc -S myfile.c.

How does the program counter in register 15 expose the pipeline?

I have a question assigned to me that states:
"The ARM puts the program counter in register r15, making it visible to the programmer. Someone writing about the ARM stated that this exposed the ARM's pipeline. What did he mean, and why?"
We haven't talked about pipelines in class yet so I don't know what that is and I'm having a hard time understanding the material online. Can someone help me out with either answering the question or helping me understand so I can form my own answer?
Thanks!
An exposed pipeline means one where the programmer needs to consider the pipeline, and I would disagree that the r15 value offset is anything other than an encoding constant.
By making the PC visible to the programmer, yes, some fragment of the early implementation detail has been 'exposed' as an architectural oddity which needs to be maintained by future implementations.
This wouldn't have been worthy of comment if the offset which has been designed into the architecture was zero - there would have been no optimisation possible for simple 3 stage pipelines, and everyone would have been non the wiser.
There is nothing 'exported' from the pipeline, not in the way that trace or debug allow you to snoop on timing behaviour at code is running - this feature is just part of the illusion that the processor hardware presents to a programmer (similar to each instruction executing in program order).
A problem with novel tricks like this is that people like to write questions about them, and those questions can easily be poorly phrased. They also neglect the fact that even if the pipeline is 3 stage, it only takes a single 'special case' to require the gates for an offset calculation (even if these gates don't consume power for typical operation).
Having PC relative instructions is rather common. Having implementation optimised encodings for how the offest is calculated is also common - for example IBM 650
Retrocomputing.SE is an interesting place to learn about some of the things that relate to the evolution of modern computers.
It doesn't really; what they are probably talking about is the program counter is two instructions ahead of the instruction being executed. But that doesn't mean its a two or three deep pipe if it ever was. It exposes nothing at this point, in the same way that the branch shadow in MIPS exposes nothing. There is the text book MIPS and there is reality.
There is nothing magical about a pipeline; it is a computer version of an assembly line. You can build a car in place and bring the engine the doors, the wheels etc to the car. Or you could move the car through an assembly line and have a station for doors, a station for the wheels and so on. You have many cars being built at once, and the number of cars coming out of the building is every few minutes but that doesn't mean a new car takes a few minutes to build. It just means the slowest step takes a few minutes, front to back it roughly takes the same amount of time.
An instruction has several should be obvious steps, an add requires that you get the instruction, decode it, gather up the operands feed them to an adder (alu) and take the result and store the result. Other instructions have similar steps and a similar number.
Your text book will use terms like fetch, decode, execute. So if you are fetching an instruction at 0x1000, then one at 0x1004 and then one at 0x1008 hoping that the code is running linearly with no branches then the when the 0x1004 is being fetched 0x1000 is being decoded, when 0x1008 is being fetched then 0x1004 is being decoded and 0x1000 might be up to execution, depends. so then one might think well when 0x1000 is being executed the program counter is fetching 0x1008 so that tells me how the pipeline works. Well it doesn't I could have a 10000 deep pipeline and have the program counter that the instruction sees be any address I like relative to the address of that instruction I could have it be 0x1000 for the instruction at 0x1000 and have a 12345 deep pipeline. Its just a definition, it might at some point in history been put in place because of a real design and real pipe or it could have always just been defined that way.
What does matter is that the definition is stated and supported by the instruction set if they say that the pc is instruction plus some offset then it needs to always be that or needs to document the exceptions, and match those definitions. Done. Programmers can then programmers compilers can be written, etc.
A textbook problem with pipelines (not saying it isn't real) is that say we run out of v8 engines, we have 12 trucks in the assembly line, we build trucks on that line for some period of time then we build cars with v6's. The engines are coming by slow boat once they finish building them but we have car parts ready now, so let's move the trucks off the line and start the line over, for N number of steps in the assembly line no cars come out the other end of the building, once the first car makes it to the end then a new car every few minutes. We had to flush the assembly line. When you are running instruction 0x1000, 0x1004, 0x1008, etc. If you can keep the pipeline moving it's more efficient but what of 0x1004 is a branch to 0x1100, we might have 0x1008's instruction and 0x100c's in the pipe, so we have to flush the pipe start fetching from 0x1100 and it takes some number of clock cycles before we are completing instructions again then ideally we complete one per clock after that until another branch. So if you read the classic textbook on the subject where mips or an educational predecessor is used, they have this notion of a branch shadow or some other similar term, the instruction after the branch is always executed.
So instead of flushing N instructions out of the pipe you flush N-1 instructions and you get an extra clock to get the next instruction after the branch into the pipeline. And that's how mips works by default but when you buy a real core you can turn that off and have it not execute the instruction after the branch. It's a great textbook illustration and was probably real and probably is real for "let's build a mips" classes in computer engineering. But the pipelines in use today don't wait that long to see that the pipe is going to be empty and can start fetching early and sometimes gain more than one clock rather than flush the whole pipe. If it ever did, it does not currently give MIPS any kind of advantage over other designs nor does it give us exposure to their pipeline.

I want to create a simple assembler in C. Where should I begin? [duplicate]

This question already has answers here:
Building an assembler
(4 answers)
How Do You Make An Assembler? [closed]
(4 answers)
Closed 9 years ago.
I've recently been trying to immerse myself in the world of assembly programming with the eventual goal of creating my own programming language. I want my first real project to be a simple assembler written in C that will be able to assemble a very small portion of the x86 machine language and create a Windows executable. No macros, no linkers. Just assembly.
On paper, it seems simple enough. Assembly code comes in, machine code comes out.
But as soon as I thinking about all the details, it suddenly becomes very daunting. What conventions does the operating system demand? How do I align data and calculate jumps? What does the inside of an executable even look like?
I'm feeling lost. There aren't any tutorials on this that I could find and looking at the source code of popular assemblers was not inspiring (I'm willing to try again, though).
Where do I go from here? How would you have done it? Are there any good tutorials or literature on this topic?
I have written a few myself (assemblers and disassemblers) and I would not start with x86. If you know x86 or any other instruction set you can pick up and learn the syntax for another instruction set in short order (an evening/afternoon), at least the lions share of it. The act of writing an assembler (or disassembler) will definitely teach you an instruction set, fast, and you will know that instruction set better than many seasoned assembly programmers for that instruction set who have not examined the microcode at that level. msp430, pdp11, and thumb (not thumb2 extensions) (or mips or openrisc) are all good places to start, not a lot of instructions, not overly complicated, etc.
I recommend a disassembler first, and with that a fixed length instruction set like arm or thumb or mips or openrisc, etc. If not then at least use a disassembler (definitely choose an instruction set for which you already have an assembler, linker, and disassembler) and with pencil and paper understand the relationship between the machine code and the assembly, in particular the branches, they usually have one or more quirks like the program counter is an instruction or two ahead when the offset is added, to gain another bit they sometimes measure in whole instructions not bytes.
It is pretty easy to brute force parse the text with a C program to read the instructions. A harder task but perhaps as educational, would be to use bison/flex and learn that programming language to allow those tools to create (an even more extreme brute force) parser which then interfaces to your code to tell you what was found where.
The assembler itself is pretty straight forward, just read the ascii and set the bits in the machine code. Branches and other pc relative instructions are a little more painful as they can take multiple passes through the source/tables to completely resolve.
mov r0,r1
mov r2 ,#1
the assembler begins parsing the text for a line (being defined as the bytes that follow a carriage return 0xD or line feed 0xA), discard the white space (spaces and tabs) until you get to something non white space, then strncmp that with the known mnemonics. if you hit one then parse the possible combinations of that instruction, in the simple case above after the mov skip over the white space to non-white space, perhaps the first thing you find must be a register, then optional white space, then a comma. remove the whitespace and comma and compare that against a table of strings or just parse through it. Once that register is done then go past where the comma is found and lets say it is either another register or an immediate. If immediate lets say it has to have a # sign, if register lets say it has to start with a lower or upper case 'r'. after parsing that register or immediate, then make sure there is nothing else on the line that shouldnt be on the line. build the machine code for this instruciton or at least as much as you can, and move on to the next line. It may be tedious but it is not difficult to parse ascii...
at a minimum you will want a table/array that accumulates the machine code/data as it is created, plus some method for marking instructions as being incomplete, the pc-relative instructions to be completed on a future pass. you will also want a table/array that collects the labels you find and the address/offset in the machine code table where found. As well as the labels used in the instruction as a destination/source and the offset in the table/array holding the partially complete instruction they go with. after the first pass, then go back through these tables until you have matched up all the label definitions with the labels used as a source or destination, using the label definition address/offset to compute the distance to the instruction in question and then finish creating the machine code for that instruction. (some disassembly may be required and/or use some other method for remembering what kind of encoding it was when you come back to it later to finish building the machine code).
The next step is allowing for multiple source files, if that is something you want to allow. Now you have to have labels that dont get resolved by the assembler so you have to leave placeholders in the output and make some flavor of the longest jump/branch instruction because you dont know how far away the destination will be, expect the worse. Then there is the output file format you choose to create/use, then there is the linker which is mostly simple, but you have to remember to fill in the machine code for the final pc relative instructions, no harder than it was in the assembler itself.
Note, writing an assembler is not necessarily related to creating a programming language and then writing a compiler for it, separate thing, different problems. Actually if you want to make a new programming language just use an existing assembler for an existing instruction set. Not required of course, but most teachings and tutorials are going to use the bison/flex approach for programming languages, and there are many college course lecture notes/resources out there for beginning compiler classes that you can just use to get you started then modify the script to add the features of your language. The middle and back ends are the bigger challenge than the front end. there are many books on this topic and many online resources as well. As mentioned in another answer llvm is not a bad place to create a new programming language the middle and backends are done for you, you only need to focus on the programming language itself, the front end.
You should look at LLVM, llvm is a modular compiler back end, the most popular front end is Clang for compiling C/C++/Objective-C. The good thing about LLVM is that you can pick the part of the compiler chain that you are interested in and just focus on that, ignoring all of the others. You want to create your own language, write a parser that generates the LLVM internal representation code, and for free you get all of the middle layer target independent optimisations and compiling to many different targets. Interesting in a compiler for some exotic CPU, write a compiler backend that takes the LLVM intermediated code and generates your assemble. Have some ideas about optimisation technics, automatic threading perhaps, write a middle layer which processes LLVM intermediate code. LLVM is a collection of libraries not a standalone binary like GCC, and so it is very easy to use in you own projects.
What you're looking for is not a tutorial or source code, it's a specification. See http://msdn.microsoft.com/en-us/library/windows/hardware/gg463119.aspx
Once you understand the specification of an executable, write a program to generate one. The executable you build should be as simple as possible. Once you have mastered that, then you can write a simple line-oriented parser that reads instruction names and numeric arguments to generate a block of code to plug into the exe. Later you can add symbols, branches, sections, whatever you want, and that's where something like http://www.davidsalomon.name/assem.advertis/asl.pdf will come in.
P.S. Carl Norum has a good point in the comment above. If your goal is create your own programming language, learning to write an assembler is irrelevant and is very much not the right way to start (unless the language you want to create is an assembly language). There are already assemblers that produce executables from assembler source, so your compiler could produce assembler source and you could avoid the work of recreating the assembler ... and you should. Or you could use something like LLVM, which will solve many other daunting problems of compiler construction. The odds are very small that you will ever actually produce your own programming language, but they're much smaller if you start from scratch and there's no need to. Decide what your goal is and use the best tools available to achieve it.

arm (bare metal): call binary file as function

I have AT91Bootloader for AT91sam9 ARM controller. I need add some extra hardware initialization, but I have only compiled .bin file.
I loaded bin file to memory and tried to call it:
((void (*)())0x00005000)();
But, haven't any results. Please use assembler as less as possible. I was introduced to assembler before, but cannot understand ARM assembler due to it's complicity. How can I make call from middle of bootloader, execute bin file (it will be in some memory sector, 0x00005000 for example) and then return to bootloader and continue executing it's own code?
If ARM asm is "too complex", you will find it very difficult to debug any problems you're having. Basic* ARM assembly is one of the least complex assembly languages I've come across.
Your code ought to work (though I would not use a hard-coded address there) provided the ".bin" is of the correct format. Common issues:
The entry point should be ARM code; some compilers default to Thumb. It's possible (if a little tricky) to make Thumb code work.
The entry point needs to be at the start of the file. Without disassembling, it's hard to tell if you've done this correctly.
The linker will insert "thunks" (a.k.a. "stubs") where necessary. A quirk in some linkers means that the thunk can be placed before the entry point. You can work around this by using --stub-group-size=-1 (docs here).
* Ignoring things like Thumb/VFP/NEON which you don't need to get started.
ARM assembly is one of the simpler ones, very straight forward. If you want to continue to do bare metal you are going to need to learn at least some assembly. For example understanding Alexey's comment.
The instruction you are looking for is BX, it branches to an address, the assembly you need to branch to the code your bootloader downloaded is:
.globl tramp
tramp:
bx r0
The C prototype is
void tramp ( unsigned int address );
As mentioned in the comments the program needs to be compiled for the address you are running it from and/or it needs to be position independent, otherwise it wont work. Also you need to build the application with the proper entry point, if it is raw binary and you branch to the address where the binary was loaded the binary needs to be able to be started that way by having the first word in the binary be the entry point for execution.
Also understand that an elf format file contains the data you want to load, but as a whole is not the data you want to load. It is a "binary file" yes but to run the program contained in it you need to parse and extract the loadable portions and load them in the right places.
If you dont know what those terms mean, use google, and/or search SO, the answers are there.

x86 way to tell instruction from data

Is there a more or less reliable way to tell whether data at some location in memory is a beginning of a processor instruction or some other data?
For example, E8 3F BD 6A 00 may be call instruction (E8) with relative offset of 0x6ABD3F, or it might be three bytes of data belonging to some other instruction, followed by push 0 (6A 00).
I know the question sounds silly and there is probably no simple way, but maybe instruction set was designed with this problem in mind and maybe some simple code examining +-100 bytes around the location can give an answer that is very likely correct.
I want to know this because I scan program's code and replace all calls to some function with calls to my replacement. It's working this far but it's not impossible that at some point, as I increase number of functions I'm replacing, some data will look exactly like a function call to that exact address, and will be replaced, and this will cause a program to break in a most unexpected fashion. I want to reduce the probability of that.
If it is your code (or another one which retaining linking and debug info), the best way is to scan symbol/relocation tables in object file. Otherwise there's no reliable way to determine if some byte is inctruction or data.
Possibly the most efficient method to qualify data is recursive disassembling. I. e. disassembling code from enty point and from all jump destinations found. But this is not completely reliable, because it does not traverse jump tables (you can try to use some heuristics for this, but this is not completely reliable too).
Solution for your problem would be patch function being replaced itself: overwrite its beginning with jump inctruction to your function.
Unfortunately, there is no 100% reliable way to distinguish code from data. From the CPU point of view, code is code only when some jump opcode induces the processor into trying to execute the bytes as if they were code. You could try to make a control flow analysis by beginning with the program entry point, and following all possible execution paths, but this may fail in the presence of pointers to function.
For your specific problem: I gather that you want to replace an existing function with a replacement of your own. I suggest that you patch the replaced function itself. I.e., instead of locating all calls to the foo() function and replacing them with a call to bar(), just replace the first bytes of foo() with a jump to bar() (a jmp, not a call: you do not want to mess with the stack). This is less satisfactory because of the double jump, but it is reliable.
It is impossible to distinguish data from instruction in general and this is because of von Neumann architecture . Analyzing the code around is helpful and disassembly tools do this. (This may be helpful. If you can't use IDA Pro /it is commercial/, use another disassembly tool.)
Plain code have a very specific entropy, so it's quite easy to distinglish it from most data. However, it's a probabilistic approach, but a large enough buffer of plain code can be recognized (especially compiler output, when you can also recognize patterns, like beginning of a function).
Also, some opcodes are reserved for future, others are available only from kernel mode. In this case by knowing them and knowing how to compute the instruction lengths (you could try a routine written by Z0mbie for that), you can do it.
Thomas suggests the right idea. To implement it properly, you need to disassemble the first few instructions (the part you would overwrite with the JMP) and generate a simple trampoline function that executes them then jumps to the rest of the original function.
There's libraries that do this for you. A well-known one is Detours but it has somewhat awkward licensing conditions. A nice implementation of the same idea with a more permissive license is Mhook.

Resources