Is there an optimizing assembly compiler? [closed] - c

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
This may sound silly, but is there an optimizing assembly compiler? Like gcc or MSVC would optimize C.
Or at least is there a usable decompiler that produces compilable C? The code doesn't need to be readable. I just want to asm -> c -> gcc -O3 -> ELF/PE.
NOTE1: Maybe I should explain why I need this, after seeing a downvote. So there is a FreeBASIC compiler that produces the executable directly, but the compiler doesn't optimize well enough. I don't know why they didn't choose C as an intermediate representation. Perhaps I can rewrite the compiler to produce C, but not really now. It's trivial at least in Linux to produce the assembly from the executable, so I was thinking maybe there is a way to optimize this automatically.

Yes. The compiler infrastructure of the Plan 9 operating system uses assemblers and linkers that do various optimizing tasks usually performed by the compiler like:
instruction selection (unsupported instructions are emulated)
register allocation (partially)
dead code elimination
some peephole optimization
See here for an introduction into the Go flavour of Plan 9 assembly.

No, and it probably cannot exist in the general case (but it might exist, and did exist in the past -1990s- for some MIPS or Sparc workstations, if the assembler code is somehow restricted). You might try to use some decompiler (but these cannot reliably work on every code, in particular, because not every assembler code is expressible in C, and because a compiler is losing some information from its source code), and then recompile with optimization the decompiled C code.
BTW, you really need some optimizing FreeBASIC compiler (e.g. code your FreeBASIC to C - or to LLVM - translator). Perhaps (if your FreeBasic program is not big) you might instead translate (by hand, perhaps with the help of some script) your FreeBASIC source into something else.
At last, FreeBASIC is free software (GPLv2 licensed). You could spend some (months or years) of effort to improve it, and make it generate either C, or LLVM code, or use LLVM or GCCJIT just-in-time compilation libraries.
I personally believe that it is not worth the effort, because BASIC is a nearly dead language (and there are lots of languages better than BASIC: Ocaml, Haskell, Common Lisp, Clojure, Scala, Scheme, Go, D, ....; and the existing living source code base in BASIC is not that big today)
It looks like the FreeBASIC-1.02.1-source/ (as provided by the FreeBASIC-1.02.1-source.tar.gz archive) contains a inc/llvm-c.bi & a src/compiler/ir-llvm.bas files, so probably already has some work to generate LLVM. You should investigate (and ask LLVM to optimize more).
Notice that I never used FreeBASIC and probably never will. But I do have the habit of looking inside free software source code, and you should get that habit too...

Related

How to extract all functions out of a compiled elf file,even the function has no symbol [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
IDA can do this:some function with no symbol will named with 'sub_address'.
How can I do this at runtime.
enter image description here
In short, you disassemble all exported functions and look for call instructions. For each call instruction you take the address operand and mark it as another function, and disassemble it too. Ditto recursively.
Such functions, found from call operands, are what IDA calls sub_XXXX.
How to extract all functions out of a compiled elf file,even the function has no symbol
You don't define what is a function for you (and you really should).
Notice that if the compiler has inlined a function, it does not appear in the ELF file, even if of course it exists in the source code (the entire program could have been built with link-time optimization, e.g. g++ -flto -O2 at both compile and link time; then you would have many inlined functions, including several which are not marked inline in the source code).
The original source code could have been compiled with visibility tricks.
The software build might have used some code obfuscation techniques.
If some function is called only indirectly (think of a virtual method in C++, always called thru some vtable; or think of some static function whose address is put into some function pointer variable or struct field) then you practically cannot detect it, since to reliably do that on the binary executable requires a precise analysis of all the possible (function pointer) values of some register or memory location (and that is undecidable, see Rice's theorem).
A program can also load a plugin at runtime (e.g. using dlopen) and call functions in it. It could also generate some machine code when running (e.g. with the help of GNU lightning, asmjit, libgccjit, etc...) and call such a generated function.
So in general you cannot achieve your goal (especially if you assume that your "adversary", the software writer, use clever techniques to make that function extraction difficult). In general, decompilation is impossible (if you want it to be precise and complete).
However, arrowd's answer is proposing some crude and incomplete approximation. You need to decide if that is enough (and even IDA is giving approximate results).
At last, in some legal systems, decompilation or reverse engineering of a binary executable is forbidden (even when technically possible); check the EULA or contract (or law) related to your binary software and your situation. You really should verify that what you are trying to do is legal (and it might not be, and in some cases you could risk jail).
BTW, all these reasons is why I prefer to always use free software, whose source code is published and can be studied and improved. I am willingly avoiding proprietary software.

How can I know where function ends in memory(get the address)- c/c++

I'm looking for a simple way to find function ending in memory. I'm working on a project that will find problems on run time in other code, such as: code injection, viruses and so fourth. My program will run with the code that is going to be checked on run time, so that I will have access to memory. I don't have access to the source code itself. I would like to examine only specific functions from it. I need to know where functions start and end in stack. I'm working with windows 8.1 64 bit.
In general, you cannot find where the function is ending in memory, because the compiler could have optimized, inlined, cloned or removed that function, split it in different parts, etc. That function could be some system call mostly implemented in the kernel, or some function in an external shared library ("outside" of your program's executable)... For the C11 standard (see n1570) point of view, your question has no sense. That standard defines the semantics of the language, i.e. properties on the behavior of the produced program. See also explanations in this answer.
On some computers (Harvard architecture) the code would stay in a different memory, so there is no point in asking where that function starts or ends.
If you restrict your question to a particular C implementation (that is a specific compiler with particular optimization settings, for a specific operating system and instruction set architecture and ABI) you might (in some cases, not in all of them) be able to find the "end of a function" (but that won't be simple, and won't be failproof). For example, you could post-process the assembler code and/or the object file produced by the compiler, inspect the ELF executable and its symbol table, examine DWARF debug information, etc...
Your question smells a lot like some XY problem, so you should motivate it, whith a lot more explanation and context.
I need to know where functions start and end in stack.
Functions don't sit on the stack, but mostly in the code segment of your executable (or library). What is on the call stack is a sequence of call frames. The organization of the call frames is specific to your ABI. Some compiler options (e.g. -fomit-frame-pointer) would make difficult to explore the call stack (without access to the source code and help from the compiler).
I don't have access to the source code itself. I would like to examine only specific functions from it.
Your problem is still ill-defined, probably undecidable, much more complex than what you believe (since related to the halting problem), and there is considerable literature related to it (read about decompiler, static code analysis, anti-virus & malware analysis). I recommend spending several months or years learning more about compilers (start with the Dragon Book), linkers, instruction set architecture, ABIs. Then look into several proceedings of conferences related to ACM SIGPLAN etc. On a practical side, study the assembler code generated by compilers (e.g. use GCC with gcc -O2 -S -fverbose-asm....); the CppCon 2017 talk: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” is a nice introduction.
I'm working on a project that will find problems on run time in other code, such as: code injection, viruses and so fourth.
I hope you can dedicate several years of full time work to your ambitious project. It probably is much more difficult than what you thought, because optimizing compilers are much more complex than what you believe (and malware software uses various complex tricks to hide itself from inspection). Malware research is really difficult, but interesting.

Is it true that it is common for programs written in C to contain assembly code?

I have read this, saying that
For example, it is common for programs that are written primarily in C to contain portions that are in an assembly language for optimization of processor efficiency.
I have never seen a program written primarily in C that contains assembly code too, at least not directly as source code. Only, their example with the Linux kernel.
Is this statement true and if so, how could it possibly optimize processor efficiency?
Aren't C code just translated into assembly code by the compiler?
No, it's not true. I'd estimate that less than 1% of C programmers even know how to program in assembly, and the need to use it is very rare. It's generally only needed for very special applications, such as some parts of an OS kernel or programming embedded systems, because they need to perform machine operations that don't have corresponding C code (such as directly manipulating CPU registers). In earlier days some programmers would use it for performance-critical sections of code, but compiler optimizations have improved significantly, and CPUs have gotten faster, so this is rarely needed now. It might still be used in the built-in libraries, so that functions like strcpy() will be as fast as possible. But application programmers almost never have to resort to assembly.
Aren't C code just translated into assembly code by the compiler?
Yes, but...
There are situations where you may want to access a specific register or other platform-specific location, and Standard C doesn't provide good ways to do that. If you want to look at a status word or load/read a data register directly, then you often need to drop down to the assembler level.
Also, even in this age of very smart optimizing compilers, it's still possible for a human assembly programmer to write assembly code that will out-perform code generated by the compiler. If you need to wring every possible cycle out of your code, you may need to "go manual" for a couple of routines.

Is it a good idea to compile a language to C?

All over the web, I am getting the feeling that writing a C backend for a compiler is not such a good idea anymore. GHC's C backend is not being actively developed anymore (this is my unsupported feeling). Compilers are targeting C-- or LLVM.
Normally, I would think that GCC is a good old mature compiler that does performs well at optimizing code, therefore compiling to C will use the maturity of GCC to yield better and faster code. Is this not true?
I realize that the question greatly depends on the nature of the language being compiled and on other factors such that getting more maintainable code. I am looking for a rather more general answer (w.r.t. the compiled language) that focuses solely on performance (disregarding code quality, ..etc.). I would be also really glad if the answer includes an explanation on why GHC is drifting away from C and why LLVM performs better as a backend (see this) or any other examples of compilers doing the same that I am not aware of.
Let me list my two biggest problems with compiling to C. If this is a problem for your language depends on what kind of features you have.
Garbage collection When you have garbage collection you may have to interrupt regular execution at just about any point in the program, and at this point you need to access all pointers that point into the heap. If you compile to C you have no idea where those pointers might be. C is responsible for local variables, arguments, etc. The pointers are probably on the stack (or maybe in other register windows on a SPARC), but there is no real access to the stack. And even if you scan the stack, which values are pointers? LLVM actually addresses this problem (thought I don't know how well since I've never used LLVM with GC).
Tail calls Many languages assume that tail calls work (i.e., that they don't grow the stack); Scheme mandates it, Haskell assumes it. This is not the case with C. Under certain circumstances you can convince some C compilers to do tail calls. But you want tail calls to be reliable, e.g., when tail calling an unknown function. There are clumsy workarounds, like trampolining, but nothing quite satisfactory.
While I'm not a compiler expert, I believe that it boils down to the fact that you lose something in translation to C as opposed to translating to e.g. LLVM's intermediate language.
If you think about the process of compiling to C, you create a compiler that translates to C code, then the C compiler translates to an intermediate representation (the in-memory AST), then translates that to machine code. The creators of the C compiler have probably spent a lot of time optimizing certain human-made patterns in the language, but you're not likely to be able to create a fancy enough compiler from a source language to C to emulate the way humans write code. There is a loss of fidelity going to C - the C compiler doesn't have any knowledge about your original code's structure. To get those optimizations, you're essentially back-fitting your compiler to try to generate C code that the C compiler knows how to optimize when it's building its AST. Messy.
If, however, you translate directly to LLVM's intermediate language, that's like compiling your code to a machine-independent high-level bytecode, which is akin to the C compiler giving you access to specify exactly what its AST should contain. Essentially, you cut out the middleman that parses the C code and go directly to the high-level representation, which preserves more of the characteristics of your code by requiring less translation.
Also related to performance, LLVM can do some really tricky stuff for dynamic languages like generating binary code at runtime. This is the "cool" part of just-in-time compilation: it is writing binary code to be executed at runtime, instead of being stuck with what was created at compile time.
Part of the reason for GHC's moving away from the old C backend was that the code produced by GHC was not the code gcc could particularly well optimise. So with GHC's native code generator getting better, there was less return for a lot of work. As of 6.12, the NCG's code was only slower than the C compiled code in very few cases, so with the NCG getting even better in ghc-7, there was no sufficient incentive to keep the gcc backend alive. LLVM is a better target because it's more modular, and one can do many optimisations on its intermediate representation before passing the result to it.
On the other hand, last I looked, JHC still produced C and the final binary from that, typically (exclusively?) by gcc. And JHC's binaries tend to be quite fast.
So if you can produce code the C compiler handles well, that is still a good option, but it's probably not worth jumping through too many hoops to produce good C if you can easier produce good executables via another route.
One point that hasn't been brought up yet is, how close is your language to C? If you're compiling a fairly low-level imperative language, C's semantics may map very closely to the language you're implementing. If that's the case, it's probably a win, because the code written in your language is likely to resemble the kind of code someone would write in C by hand. That was definitely not the case with Haskell's C backend, which is one reason why the C backend optimized so poorly.
Another point against using a C backend is that C's semantics are actually not as simple as they look. If your language diverges significantly from C, using a C backend means you're going to have to keep track of all those infuriating complexities, and possibly differences between C compilers as well. It may be easier to use LLVM, with its simpler semantics, or devise your own backend, than keep track of all that.
Aside form all the codegenerator quality reasons, there are also other problems:
The free C compilers (gcc, clang) are a bit Unix centric
Support more than one compiler (e.g. gcc on Unix and MSVC on Windows) requires duplication of effort.
compilers might drag in runtime libraries (or even *nix emulations) on Windows that are painful. Two different C runtimes (e.g. linux libc and msvcrt) to base on complicate your own runtime and its maintenance
You get a big externally versioned blob in your project, which means a major version transition (e.g. a change of mangling could hurts your runtime lib, ABI changes like change of alignment) might require quite some work. Note that this goes for compiler AND externally versioned (parts of the) runtime library. And multiple compilers multiply this. This is not so bad for C as backend though as in the case where you directly connect to (read: bet on) a backend, like being a gcc/llvm frontend.
In many languages that follow this path, you see Cisms trickle through into the main language. Of course this won't happy to you, but you will be tempted :-)
Language functionality that doesn't directly map to standard C (like nested procedures,
and other things that need stack fiddling) are difficult.
If something is wrong, users will be confronted with C level compiler or linker errors that are outside their field of experience. Parsing them and making them your own is painful, specially with multiple compilers and -versions
Note that point 4 also means that you will have to invest time to just keep things working when the external projects evolve. That is time that generally doesn't really go into your project, and since the project is more dynamic, multiplatform releases will need a lot of extra release engineering to cater for change.
So in short, from what I've seen, while such a move allows a swift start (getting a reasonable codegenerator for free for many architectures), there are downsides. Most of them are related to loss of control and poor Windows support of *nix centric projects like gcc. (LLVM is too new to say much on long term, but their rhetoric sounds a lot like gcc did ten years ago). If a project you are hugely dependent on keeps a certain course (like GCC going to win64 awfully slow), then you are stuck with it.
First, decide if you want to have serious non *nix ( OS X being more unixy) support, or only a Linux compiler with a mingw stopgap for Windows? A lot of compilers need first rate Windows support.
Second, how finished must the product become? What's the primary audience ? Is it a tool for the open source developer that can handle a DIY toolchain, or do you want to target a beginner market (like many 3rd party products, e.g. RealBasic)?
Or do you really want to provide a well rounded product for professionals with deep integration and complete toolchains?
All three are valid directions for a compiler project. Ask yourself what your primary direction is, and don't assume that more options will be available in time. E.g. evaluate where projects are now that chose to be a GCC frontend in the early nineties.
Essentially the unix way is to go wide (maximize platforms)
The complete suites (like VS and Delphi, the latter which recently also started to support OS X and has supported linux in the past) go deep and try maximize productivity. (support specially the windows platform nearly completely with deep levels of integration)
The 3rd party projects are less clear cut. They go more after self-employed programmers, and niche shops. They have less developer resources, but manage and focus them better.
As you mentioned, whether C is a good target language depends very much on your source language. So here's a few reasons where C has disadvantages compared to LLVM or a custom target language:
Garbage Collection: A language that wants to support efficient garbage collection needs to know extra information that interferes with C. If an allocation fails, the GC needs to find which values on the stack and in registers are pointers and which aren't. Since the register allocator is not under our control we need to use more expensive techniques such as writing all pointers to a separate stack. This is just one of many issues when trying to support modern GC on top of C. (Note that LLVM also still has some issues in that area, but I hear it's being worked on.)
Feature mapping & Language-specific optimisations: Some languages rely on certain optimisations, e.g., Scheme relies on tail-call optimisation. Modern C compilers can do this but are not guaranteed to do this which could cause problems if a program relies on this for correctness. Another feature that could be difficult to support on top of C is co-routines.
Most dynamically typed languages also cannot be optimised well by C-compilers. For example, Cython compiles Python to C, but the generated C uses calls to many generic functions which are unlikely to be optimised well even by latest GCC versions. Just-in-time compilation ala PyPy/LuaJIT/TraceMonkey/V8 are much more suited to give good performance for dynamic languages (at the cost of much higher implementation effort).
Development Experience: Having an interpreter or JIT can also give you a much more convenient experience for developers -- generating C code, then compiling it and linking it, will certainly be slower and less convenient.
That said, I still think it's a reasonable choice to use C as a compilation target for prototyping new languages. Given that LLVM was explicitly designed as a compiler backend, I would only consider C if there are good reasons not to use LLVM. If the source-level language is very high-level, though, you most likely need an earlier higher-level optimisation pass as LLVM is indeed very low-level (e.g., GHC performs most of its interesting optimisations before generating calling into LLVM). Oh, and if you're prototyping a language, using an interpreter is probably easiest -- just try to avoid features that rely too much on being implemented by an interpreter.
Personally I would compile to C. That way you have a universal intermediary language and don't need to be concerned about whether your compiler supports every platform out there. Using LLVM might get some performance gains (although I would argue the same could probably be achieved by tweaking your C code generation to be more optimizer-friendly), but it will lock you in to only supporting targets LLVM supports, and having to wait for LLVM to add a target when you want to support something new, old, different, or obscure.
As far as I know, C can't query or manipulate processor flags.
This answer is a rebuttal to some of the points made against C as a target language.
Tail call optimizations
Any function that can be tail call optimized is actually equivalent to an iteration (it's an iterative process, in SICP terminology). Additionally, many recursive functions can and should be made tail recursive, for performance reasons, by using accumulators etc.
Thus, in order for your language to guarantee tail call optimization, you would have to detect it and simply not map those functions to regular C functions - but instead create iterations from them.
Garbage collection
It can be actually implemented in C. You can create a run-time system for your language which consists of some basic abstractions over the C memory model - using for example your own memory allocators, constructors, special pointers for objects in the source language, etc.
For example instead of employing regular C pointers for the objects in the source language, a special structure could be created, over which a garbage collection algorithm could be implemented. The objects in your language (more accurately, references) - could behave just like in Java, but in C they could be represented along with meta-information (which you wouldn't have in case you were working just with pointers).
Of course, such a system could have problems integrating with existing C tooling - depends on your implementation and trade-offs that you're willing to make.
Lacking operations
hippietrail noted that C lacks rotate operators (by which I assume he meant circular shift) that are supported by processors. If such operations are available in the instruction set, then they can be added using inline assembly.
The frontend would in this case have to detect the architecture which it's running for and provide the proper snippets. Some kind of a fallback in the form of a regular function should also be provided.
This answer seems to be addressing some core issues seriously. I'd like to see some more substantiation on which problems exactly are caused by C's semantics.
There's a particular case where if you're writing a programming language with strong security* or reliability requirements.
For one, it would take you years to know a big enough subset of C well enough that you know all the C operations you will choose to employ in your compilation are safe and don't evoke undefined behaviour. Secondly, you'd then have to find an implementation of C that you can trust (which would mean a tiny trusted code base, and probably wont be very efficient). Not to mention you'll need to find a trusted linker, OS capable of executing compiled C code, and some basic libraries, all of which would need to be well-defined and trusted.
So in this case you might as well either use assembly language, if you care about about machine independence, some intermediate representation.
*please note that "strong security" here is not related at all to what banks and IT businesses claim to have
Is it a good idea to compile a language to C?
No.
...which begs one obvious question: why do some still think compiling via C is a good idea?
Two big arguments in favour of misusing C in this fashion is that it's stable and standardised:
For GCC, it's C or bust (but there is work underway which may allow other options).
For LLVM, there's the routine breaking of backwards-compatibility in its IR and APIs - what would you prefer: spending time on improving your project or chasing after LLVM?
Providing little more than a promise of stability is somewhat ironic, considering the intended purpose of LLVM.
For these and other reasons, there are various half-done, toy-like, lab-experiment, single-site/use, and otherwise-ignominious via-C backends scattered throughout cyberspace - being abandoned, most have succumbed to bit-rot. But there are some projects which do manage to progress to the mainstream, and their success is then used by via-C supporters to further perpetuate this fantasy.
But if you're one of those supporters, feel free to make fantasy into reality - there's that work happening in GCC, or the resurrected LLVM backend for C. Just imagine it: two well-built, well-maintained via-C backends into which the sum of all prior knowledge can be directed.
They just need you.

How to create a C compiler for custom CPU?

What would be the easiest way to create a C compiler for a custom CPU, assuming of course I already have an assembler for it?
Since a C compiler generates assembly, is there some way to just define standard bits and pieces of assembly code for the various C idioms, rebuild the compiler, and thereby obtain a cross compiler for the target hardware?
Preferably the compiler itself would be written in C, and build as a native executable for either Linux or Windows.
Please note: I am not asking how to write the compiler itself. I did take that course in college, I know about general compiler-compilers, etc. In this situation, I'd just like to configure some existing framework if at all possible. I don't want to modify the language, I just want to be able to target an arbitrary architecture. If the answer turns out to be "it doesn't work that way", that information will be useful to myself and anyone else who might make similar assumptions.
Quick overview/tutorial on writing a LLVM backend.
This document describes techniques for writing backends for LLVM which convert the LLVM representation to machine assembly code or other languages.
[ . . . ]
To create a static compiler (one that emits text assembly), you need to implement the following:
Describe the register set.
Describe the instruction set.
Describe the target machine.
Implement the assembly printer for the architecture.
Implement an instruction selector for the architecture.
There's the concept of a cross-compiler, ie., one that runs on one architecture, but targets a different one. You can see how GCC does it (for example) and add a new architecture to the set, if that's the compiler you want to extend.
Edit: I just spotted a question a few years ago on a GCC mailing list on how to add a new target and someone pointed to this
vbcc (at www.compilers.de) is a good and simple retargetable C-compiler written in C. It's much simpler than GCC/LLVM. It's so simple I was able to retarget the compiler to my own CPU with a few weeks of work without having any prior knowledge of compilers.
The short answer is that it doesn't work that way.
The longer answer is that it does take some effort to write a compiler for a new CPU type. You don't need to create a compiler from scratch, however. Most compilers are structured in several passes; here's a typical architecture (a lot of variations are possible):
Syntactic analysis (lexer and parser), and for C preprocessing, leading to an abstract syntax tree.
Type checking, leading to an annotated abstract syntax tree.
Intermediate code generation, leading to architecture-independent intermediate code. Some optimizations are performed at this stage.
Machine code generation, leading to assembly or directly to machine code. More optimizations are performed at this stage.
In this description, only step 4 is machine-dependent. So you can take a compiler where step 4 is clearly separated and plug in your own step 4. Doing this requires a deep understanding of the CPU and some understanding of the compiler internals, but you don't need to worry about what happens before.
Almost all CPUs that are not very small, very rare or very old have a backend (step 4) for GCC. The main documentation for writing a GCC backend is the GCC internals manual, in particular the chapters on machine descriptions and target descriptions. GCC is free software, so there is no licensing cost in using it.
1) Short answer:
"No. There's no such thing as a "compiler framework" where you can just add water (plug in your own assembly set), stir, and it's done."
2) Longer answer: it's certainly possible. But challenging. And likely expensive.
If you wanted to do it yourself, I'd start by looking at Gnu CC. It's already available for a large variety of CPUs and platforms.
3) Take a look at this link for more ideas (including the idea of "just build a library of functions and macros"), that would be my first suggestion:
http://www.instructables.com/answers/Custom-C-Compiler-for-homemade-instruction-set/
You can modify existing open source compilers such as GCC or Clang. Other answers have provided you with links about where to learn more. But these compilers are not designed to easily retargeted; they are "easier" to retarget than compilers than other compilers wired for specific targets.
But if you want a compiler that is relatively easy to retarget, you want one in which you can specify the machine architecture in explicit terms, and some tool generates the rest of the compiler (GCC does a bit of this; I don't think Clang/LLVM does much but I could be wrong here).
There's a lot of this in the literature, google "compiler-compiler".
But for a concrete solution for C, you should check out ACE, a compiler vendor that generates compilers on demand for customers. Not free, but I hear they produce very good compilers very quickly. I think it produces standard style binaries (ELF?) so it skips the assembler stage. (I have no experience or relationship with ACE.)
If you don't care about code quality, you can likely write a syntax-directed translation of C to assembler using a C AST. You can get C ASTs from GCC, Clang, maybe ANTLR, and from our DMS Software Reengineering Toolkit.

Resources