Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
So what I am looking for is not finding a assembly emulator.
Basically what I am trying to do is translation assembly to c,
Although I have a IDA pro with the "F5" dis-compile function, but
generally I am trying to do a simulation approach.
I made this examples by hand to demonstrate my idea:
mov %eax, 10
add %eax, 5
jmp foo
I want to directly translate it into a simulated c procedure like this
unsigned v_eax = 0;
v_eax = 10;
v_eax += 5;
goto foo;
I think this is pretty like a assembly simulator, which has the process like
assembly --> running in a CPU simulator in C --> output the results
But what I am trying to do is like this
assembly --> translate into a c source code --> compile --> run to get the results
After a quick search, I think this paper has an approach which is similiar to what I am trying to do (however I don't any analysis work, just translation of some simple assembly code)
Could anyone give some help on this issue..?
Thank you!
What help are you looking for? If you have specific questions, ask those questions.
It looks like you've already got the general idea: Set up a bunch of variables to represent the registers, set up a large array to represent the memory, implement either subroutines or macros (chunks of code generated in-line) that represent each instruction and do the Right Thing with those resources, implement additional macros or subroutines which are wrappers for or equivalent to every operating system call or external library function which the programs might invoke (I/O most importantly), write a "loader" for the executable file, then go through the program converting instructions to those macros. Be sure to fix up goto/call addresses properly, and hope like heck that the programmers kept data blocks and code blocks distinct. Get it all debugged, and it should work. Extremely slowly, but that's what you've asked for.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm currently making a program in C that needs to find billions of square roots. I looked up which known code finds the square root faster and came across this code which is seemingly the fastest. https://www.codeproject.com/Articles/69941/Best-Square-Root-Method-Algorithm-Function-Precisi
double inline __declspec (naked) __fastcall sqrt(double n)
{
_asm fld qword ptr[esp + 4]
_asm fsqrt
_asm ret 8
}
I don't know much about assembly language so can someone please explain what this code does algorithmically and what those keywords mean?
This is Microsoft Specific naked fast call of the standard sqrt function.
For detail info please check Microsoft documentation.
The naked storage-class attribute is a Microsoft-specific extension to the C language. For functions declared with the naked storage-class attribute, the compiler generates code without prolog and epilog code. You can use this feature to write your own prolog/epilog code sequences using inline assembler code. Naked functions are particularly useful in writing virtual device drivers.
See: Naked functions.
The __fastcall calling convention specifies that arguments to functions are to be passed in registers, when possible. This calling convention only applies to the x86 architecture. Take a look at:
__fastcall
__fastcall was introduced a long time ago by Microsoft. Typically fastcall calling conventions pass one or more arguments in registers which reduces the number of memory accesses required for the call. With on-chip caching, the gain from passing things in registers is not a much gain as it use to be.
And __stdcall may be actually faster now.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I learned not long ago that most of the asm compilers were written in C or other languages, and we say assembler is the fastest language. But if it's coded in C, how can it be faster than the C itself? What does the compiler do then? Are there compilers of ASM in ASM? I do not really understand how it all works ... I searched on the internet, but I did not find clearly what I was looking for ...
Would you have explained or given me any links that could help me better understand the concept of assemblies compilers?
There are three concepts getting tossed around here:
Speed of a compiler
Speed of a processor
Speed of an executable
First, to get it out of the way, the time it takes to compile some executable has very little relationship to the time it takes for that executable to run. (The compiler can take longer to do some careful analysis and apply optimizations.)
The speed at which your processor can operate is another thing. Assembly language is the closest to machine language, which is what your processor understands. Any given instruction in machine language will operate at the speed that the machine processes that instruction.
Everything that executes on your processor must, by definition, be at some point converted to machine language so that your processor can understand and execute it.
That’s where things get tricky. An assembler will translate code you write directly to machine language, but there is more to a program than just knowing how to convert to machine language. Suppose you have a complex value, such as a collection of options. These options must be maintained as strings, integers, floats, etc. How are they stored? How are they accessed?
The way in which all this is done can vary. The way you organize your program can vary. These variations make a difference in executable time.
So you can write a very slow program using assembly language and a very fast program using an interpreted language. And, frankly, compilers are often better at organizing the final machine code than you are, even if you are using an assembler directly.
So to bring it to a point: the compiler’s job is to transform your textual source code (C, or assembly, or whatever) into machine code, which is what your processor understands. Once done, the compiler is no longer necessary.
There is significantly more to it than that, but that is the general idea.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have been learning c and data structures for quite some time now and I wanted to see whether I could apply what I have learnt. I searched a bit and found out that I could start with util linux but, before I could do so, I thought I'd check and perhaps dabble a bit with the code for basic unix commands like "cat". I was able to understand what a part of the code might have been trying to do, but I was not able to understand the entire code as a unit.
For example, in the "cat" code, a pointer to the output buffer and input buffer is declared and is appropriately used, which I could understand. What i could not understand, are parts of code like io_blksize (stat_buf) which has no description whatsoever, on what it does. Or how two pointers declared as pointers to the input and output buffers, actually correspond to the input and output buffers ?
So my question being, how do I approach these type of code, how can I understand something that has no description to what it does (in the example given above) and how can I make and implement changes in the code, so that I can see the changes when i run a command ?
(Would really appreciate references or topics I should start with, so that I can relate what I have learnt to how command code's can be modified. I also apologize if the question is to abstract.)
This is a bit of a subjective question so my answers will just be my opinion of course.
A good place to start when you run into something you don't recognise while reading source code is the manpages. Each function will generally have a manpage, e.g. man 2 read or man 3 printf. Beyond that, I feel perhaps you should get more of a foundation in Unix before attempting to read the straight source code, a good book is Advanced Programming in the Unix Environment. I've been working through it myself and am finding my Unix knowledge improving considerably.
Just my two cents.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm writing a language that compiles to C right now, and when I say IL I mean as in C is the language I write the code as to then generate assembly by another c compiler, e.g. gcc or clang.
The C code I generate, will it be more beneficial to:
If I do some simple opt passes (constant propagation, dead code removal, ...) will this reduce the amount of work the C compiler has to do, or make it harder because it's not really human C code?
If I were to compile to say three-address code or SSA or some other form and then feed this into a C program with functions, labels, and variables - would that make it easier or harder for the C compiler to optimize?
Which kind of link together to form the following question...
What is the most optimal way to produce good C code from a language that compiles to C?
Is it worth doing any optimisations at all and leaving that to the compiler?
Generally there's not much point doing peephole type optimisations because the C compiler will simply do those for you. What is expensive is a) wasted or unnecessary "gift-wrapping" operations, b) memory accesses, c) branch mispredictions.
For a), make sure you're not passing data about too much, because whilst C will do constant propagation, there's a limit to how far it can detect that two buffers are in fact aliases of the same underlying data. For b) try to keep functions short and operations on the same data together, also limit heap memory use to improve cache performance. For c), the compiler understand for loops, it doesn't understand goto loops. So it will figure that
for(i=0;i<N;i++)
will usually take the loop body, it wont figure that
if(++i < N) goto do_loop_again
will usually take the jump.
So really the rule is to make your automatic code as human-like as possible. Though if it's too human-like, that raises the question of what your language has to offer that C doesn't - the whole point of a non-C language is to create a spaghetti of gotos in the C source, a nice structure in the input script.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
In an argument with a friend, I made the remark that it is impossible to write, in any language besides C, a program that is faster than all variants in C, that do the same thing. My argument was based on an affirmative answer to the question below. Is it true?
If we think of "compiling" as a map from [C programs] to [assembly programs], then is this map surjective?
Caveat: Of course, you can include assembly in C programs, but pretend that isn't possible (makes for a more interesting question!).
The answer to the question If we think of "compiling" as a map from [C programs] to [assembly programs], then is this map surjective? is obviously NO.
It can be proven trivially:
* There could be assembly language instructions that the compiler will not generate, such as int 10, halt, jmp *eax, iret, sub esp,esp...
* You might be fiddling with registers in assembly that the C compiler never touches, such as segment registers.
There is just a world of creativity in assembly that the C language cannot express.
Regarding the other question, I'm not sure what you mean by
it is impossible to write, in any language besides C, a program that is faster than all variants in C, that do the same thing.
If you mean that a skilled programmer can always write a C program that will be faster at a given task than any other program written in any language, I think you probably wrong too, because the compiler itself is a fixed variable that is imperfect.
Imagine for example that the C compiler is very dumb and generates unoptimized code. It is obvious that an assembly program can be written that will beat the best C variation at the given task: all that is needed is to optimize the unoptimized code. Since the C compiler is imperfect, you can always find a task for which even the best C variation can be further optimized.