Making assembly function inline in x64 Visual Studio

Making assembly function inline in x64 Visual Studio - c

I know that MSVC compiler in x64 mode does not support inline assembly snippets of code, and in order to use assembly code you have to define your function in some external my_asm_funcs.asm file like that:
my_asm_func PROC
mov rax, rcx
ret
my_asm_func ENDP
And then in your .c or .h file you define a header for the function like that:
int my_asm_func(int x);
Although that solution answers many concerns, but I am still interested in making that assembly code function to be inline, in other words - after compilation I don't want any "calls" to my_asm_func, I just want this piece of assembly to be glued into my final compiled code. I tried declaring the function with inline and __forceinline keywords, but nothing seems to be helping. Is there still any way to do what I want?

No, there is no way to do what you want.
Microsoft's compiler doesn't support inline assembly for x86-64 targets, as you said. This forces you to define your assembly functions in an external code module (*.asm), assemble them with MASM, and link the result together with your separately-compiled C/C++ code.
The required separation of steps means that the C/C++ compiler cannot inline your assembly functions because they are not visible to it at the time of compilation.
Even with link-time code generation (LTCG) enabled, your assembly module(s) will not get inlined because the linker simply doesn't support this.
There is absolutely no way to get assembly functions written in a separate module inlined directly into C or C++ code.
There is no way that the inline or __forceinline keywords could do anything. In fact, there's no way that you could use them without a compiler error (or at least a warning). These annotations have to go on the function's definition (which, for an inline function, is the same as its declaration), but you can't put it on the function's definition, since that's defined in a separate *.asm file. These aren't MASM keywords, so trying to add them to the definition would necessarily result in an error. And putting them on the forward declaration of the assembly function in the C header is going to be similarly unsuccessful, since there's no code there to inline—just a prototype.
This is why Microsoft recommends using intrinsics. You can use these directly in your C or C++ code, and the compiler will emit the corresponding assembly code automatically. Not only does this accomplish the desired inlining, but intrinsics even allow the optimizer to function, further improving the results. No, intrinsics do not lead to perfect code, and there aren't intrinsics for everything, but it's the best you can do with Microsoft's compiler.
Your only other alternative is to sit down and play with various permutations of C/C++ code until you get the compiler to generate the desired object code. This can be very powerful in cases where intrinsics are not available for the instructions that you wish to be generated, but it does take a lot of time spent fidgeting, and you'll have to revisit it to make sure it continues to do what you want when you upgrade compiler versions.

Since the title mentions Visual Studio and not MSVC, I recommend installing Clang via the Visual Studio Installer. It can be used just like MSVC without needing to configure custom build tasks or anything and it supports inline assembly with Intel syntax and variables as operands.
Just select "LLVM (clang-cl)" in Platform Toolset from the General section of the property pages in your project and you're good to go.

Yes you can. Assemble your procedure as shellcode and extract the bytes, then include it in a buffer with RWX memory protection in your code. Call the code.

Related

How to write inline Assembly with Turbo C 2.01?

I want to write some inline assembly in a DOS program which is compiled using Turbo C 2.01. When I write
asm {
nop
}
the compiler claims that in-line assembly is not allowed in function .... See:
Any ideas?

See the Turbo C user manual page 430:
Inline assembly not allowed
Your source file contains inline assembly language statements and you are compiling it from within the
Integrated Environment. You must use the TCC command to compile this
source file.
I believe that you need also to pass the -B option to TCC (page 455).
Alternatively you can use __emit__ (page 103) for relatively simple code entered as machine code rather than assembler mnemonics.
It seems an odd restriction to not allow inline assembly in the IDE. You might consider "upgrading" to Turbo C++ 3.0 which I believe does allow it. I would imagine that TC++ will compile C code when presented with a .c file, or that the IDE can be set to compile C explicitly. There's a manual for that too.

Turbo C converts C code directly into machine code without using an assembler phase, and thus cannot include assembly language source within a program. What it can do, however, is use the __emit directive to insert machine code. The cleanest way to use that is probably to use a separate assembler (or perhaps DEBUG) to process the code of interest by itself into a COM file, and then enter the byte values therein into an __emit directive. Parameters are stored in ascending order left to right, starting at either BP+4 (in tiny, small, or compact model) or BP+6 (medium, large, or huge). Local variables are stored at addresses below BP.
When using Turbo Pascal, it's possible to use a handy program called "inline assembler" to convert assembly-language source into a Turbo Pascal literal-code directive. Turbo Pascal's directive is formatted differently from C's (I like Pascal's better) and can accommodate labels in ways Turbo C's cannot. Still, using __emit may have far less impact on build times than trying to use inline assembly code.

Can I write (x86) assembly language which will build with both GCC and MSVC?

I have a project which is entirely written in C. The same C files can be compiled using either GCC for Linux or MSVC for Windows. For performance reasons, I need to re-write some of the code as x86 assembly language.
Is it possible to write this assembly language as a source file which will build with both the GCC and MSVC toolchains? Alternatively, if I write an assembly source file for one toolchain, is there a tool to convert it to work with the other?
Or, am I stuck either maintaining two copies of the assembly source code, or using a third-party assembler such as NASM?

I see two problems:
masm and gas have different syntax. gas can be configured to use Intel syntax with the .syntax intel,noprefix directive, but even then small differences remain (such as, different directives). A possible approach is to preprocess your assembly source with the C preprocessor, using macros for all directives that differ between the two. This also has the advantage of providing a unified comment syntax.
However, just using a portable third party assembler like nasm is likely to be less of a hassle.
Linux and Windows have different calling conventions. A possible solution for x86-32 is to stick to a well-supported calling convention like stdcall. You can tell gcc what calling convention to use when calling a function using function attributes. For example, this would declare foo to use the stdcall calling convention:
extern int foo(int x, int y) __attribute__((stdcall));
You can do the same thing in MSVC with __declspec, solving this issue.
On x86-64, a similar solution is likely possible, but I'm not exactly sure what attributes you have to set.
You can of course also use the same cpp-approach as for the first problem to generate slightly different function prologues and epilogues depending on what calling convention you need. However, this might be less maintainable.

Why doesn't (can't) the OS translate C code directly into machine language instead first translating it into assembly language?

As far as I've understood, when a program (written in C for example) is compiled, it is first translated into assembly language and then into machine language. Why can't (isn't) the "assembly language step" be skipped?

Your understanding is wrong, compilers do not necessarily translate C code into assembler. They usually perform several phases and have internal representations, but this doesn't necessarily resemble to a human readable assembler.
Here, I found a nice introduction for LLVM. LLVM is the compiler toolkit that is used for clang.

It is easier for the compiler developers.
It is possible to write a compiler that reads C and writes object code. However, this requires the compiler writer to write all the computations that encode instructions. Instruction encodings are intricate on some machines. Additionally, there are fields to fill in that depend on other interactions, such as how far away a branch target is, which depends on what instructions are between the branch and the target.
Additionally, part of the way a compiler is written is with patterns that say things like “To increment an object x, issue an increment instruction.” In order to write object code directly, you have to encode all the instructions you want to write into those patterns. That means your patterns must have some sort of language for describing instructions.
Well, we already have a language for that: assembly language. So it is simply easier to write your patterns in ways like “To increment an object x, issue inc x.”
Modern compilers have many layers. There is a front end that reads C text (or other languages) and turns it into a language internal to the compiler. There is an optimizer that operates on the internal language (or a representation of it) and tries to improve the code. There is a back end that turns the internal language into assembly language. There is an assembler that turns the assembly into object code. And there is a linker that links object code into an executable file.
As with many complex tasks, it is simply easier for human minds to work with a complex task when it is separated into nice pieces. This reduces bugs and improves the time it takes to work with software. It also makes software flexible, because we can change the front end to support a new language (e.g., Java instead of C) or change the back end to support a new processor (change from Intel assembly to PowerPC assembly). And changing one optimizer improves all the compilers, for Java and C and Intel and PowerPC.
The gcc command that we use to compile is actually just a driver that calls other programs that perform the front-end processing, the optimization, the assembly, and the linking. You can also call most of these phases separately, or use a switch to tell gcc to show you the commands it is using.
Additionally, GCC has a feature that allows developers to insert assembly language directly intermixed with the C code. This compels GCC to include an assembler.

The operating system does not do anything like that. This is the job of the compiler. And in fact, many do directly emit object files - you have to explicitly ask them to emit assembly code. Others choose not to because emitting a fully-featured object file requires expert knowledge about the various formats which exist for this. Assemblers have various convenience features which make the job easier, can (sometimes?) target multiple object file formats without changes in the assembly code. Also, it is a very useful feature to emit annotated assembly code, so not having a separate code generator only for direct object file emission saves you time without any restrictions (except needing an assembler), which makes it an attractive option when you have limited resources.

Depends on the compiler; there is no actual need for the assembly code.
Maybe the authors of whatever compiler you are talking about (GNU-CC?) considered it slightly easier for themselves if they didn't have to resolve certain things like branches themselves.

Assembly code is purely a convenient, somewhat-human-readable representation of the machine code and the symbolic references and relocations needed by the linker when putting together the output of different translation units. Without an intermediate assembly-language step, the compiler would also be responsible for generating the relocations in the form the linker needs, which is doable, but painful. Since an assembler with this capability already exists for processing hand-written assembly code, it makes sense to use it.

There is usually no assembler stage. MSVC (cl.exe) and GCC produce machine code (.obj, .o) right away.

A cross compiler can directly generate the machine code without the help of the OS where that cross compiler is installed.
For example, tornado package installed in windows can generate machine code for vxworks.

How does C code call assembly code (e.g. optimized strlen)?

I always read things about how certain functions within the C programming language are optimized by being written in assembly. Let me apologize if that sentence sounds a little misguided.
So, I'll put it clearly: How is it that when you call some functions like strlen on UNIX/C systems, the actual function you're calling is written in assembly? Can you write assembly right into C programs somehow or is it an external call situation? Is it part of the C standard to be able to do this, or is it an operating system specific thing?

The C standard dictates what each library function must do rather than how it is implemented.
Almost all known implementations of C are compiled into machine language. It is up to the implementers of the C compiler/library how they choose to implement functions like strlen. They could choose to implement it in C and compile it to an object, or they could choose to write it in assembly and assemble it to an object. Or they could implement it some other way. It doesn't matter so long as you get the right effect and result when you call strlen.
Now, as it happens, many C toolsets do allow you to write inline assembly, but that is absolutely not part of the standard. Any such facilties have to be included as extensions to the C standard.

At the end of the road compiled programs and programs in assembly are all machine language, so they can call each other. The way this is done is by having the assembly code use the same calling conventions (way to prepare for a call, prepare parameters and such) as the program written in C. An overview of popular calling conventions for x86 processors can be found here.

Many (most?) C compilers do happen to support inline assembly, though it's not part of the standard. That said, there's no strict need for a compiler to support any such thing.
First, recognize that assembly is mostly just human (semi-)readable machine code, and that C ends up as machine code anyway.
"Calling" a C function just generates a set of instructions that prepare registers, the stack, and/or some other machine-dependent mechanism according to some established calling convention, and then jumps to the start of the called function.
A block of assembly code can conform to the appropriate calling convention, and thus generate a blob of machine code that another blob of machine code that was originally written in C is able to call. The reverse is, of course, also possible.
The details of the calling convention, the assembly process, and the linking process (to link the assembly-generated object file with the C-generated object file) may all vary wildly between platforms, compilers, and linkers. A good assembly tutorial for your platform of choice will probably cover such details.
I happen to like the x86-centric PC Assembly Tutorial, which specifically addresses interfacing assembly and C code.

When C code is compiled by gcc, it's first compiled to assembler instructions, which are then again compiled to a binary, machine-executable file. You can see the generated assembler instructions by specifying -S, as in gcc file.c -S.
Assembler code just passes the first stage of C-to-assembler compilation and is then indistinguishable from code compiled from C.

One way to do it is to use inline assembler. That means you can write assembler code directly into your C code. The specific syntax is compiler-specific. For example, see GCC syntax and MS Visual C++ syntax.

You can write inline assembly in your C code. The syntax for this is highly compiler specific but the asm keyword is ususally used. Look into inline assembly for more information.

inline a function inside another inline function in C

I currently have inline functions calling another inline function (a simple 4 lines big getAbs() function). However, I discovered by looking to the assembler code that the "big" inline functions are well inlined, but the compiler use a bl jump to call the getAbs() function.
Is it not possible to inline a function in another inline function? By the way, this is embedded code, we are not using the standard libraries.
Edit : The compiler is WindRiver, and I already checked that inlining would be beneficial (4 instructions instead of +-40).

Depending on what compiler you are using you may be able to encourage the compiler to be less reluctant to inline, e.g. with gcc you can use __attribute__ ((always_inline)), with Intel ICC you can use icc -inline-level=1 -inline-forceinline, and with Apple's gcc you can use gcc -obey-inline.

The inline keyword is a suggestion to the compiler, nothing more. It's free to take that suggestion on board, totally ignore it or even lie to you and tell that it's doing it while it's really not.
The only way to force code to be inline is to, well, write it inline. But, even, then the compiler may decide it knows better and decide to shift it out to another function. It has a lot of leeway in generating executable code for your particular source, provided it doesn't change the semantics of it.
Modern compilers are more than capable of generating better code than most developers would hand-craft in assembly. I think the inline keyword should go the same path as the register keyword.
If you've seen the output of gcc at its insane optimisation level, you'll understand why. It has produced code that I wouldn't have dreamed possible, and that took me a long time to understand.
As an aside, check this out for what optimisations that gcc actually has, including a great many containing the text "inline" or "inlining".

#gramm: There's quite a few scenarios in which inline isn't necessarily to your benefit. Most compilers use some very advanced heuristics to determine when to inline. When discussing inlining, the simplest idea is, trust your compiler to produce the fastest code.

I have recently had a very similar problem, reading this post has given me a wackky idea. Why not Have a simple pre-compilation (a simple reg ex should do the job ) code parser that parses out the function call to actually put the source code in-line. use a tag such as /inline/ /end_of_inline/ so that you can use normal ide features (if you are or might use an ide.
Include this in your build process, that way you have the readability advantage as well as removing the compilers assumption that you are only as good a developer as most and do not understand when to in-line.
Nonetheless before trying this you should probably go through the compilers command line options.

I would suggest that if your getAbs() function (sounds like absolute value but you really should be showing us code with the question...) is 4 lines long, then you have much bigger optimizations to worry about than whether the code gets inlined or not.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight