sdcc inline asm() not working

sdcc inline asm() not working - c

I'm using GCC (correction) SDCC with the Eclipse IDE to compile C code for an 8051 architecture embedded target. I need to insert a few NOPs for timing, and I can't get the compiler to accept inline assembly code.
With __asm__ ("; This is a comment\nlabel:\n\tnop"); (as suggested below) or variations I get warning 112: function '__asm__' implicit declaration and then error 101: too many parameters, as if I'm trying to call an undeclared function. I've tried all other options in the SDCC manual section 3.14 also. __asm ... __endasm gives a syntax error on __asm, same with a single underbar, and combinations of whitespace, newlines, or the same line don't help.
If I'm piecing together the command line from the Makefile correctly (without the #include path), the CFLAGS on the SDCC command line are:
-Wp,-MD,$(#:%.rel=%.d),-MT,$#,-MP --disable-warning 110 -Wa,-p --model-medium

Moved from comment
In the sources of SDCC 3.1.0's lexer, I see that both _asm/_endasm and __asm/__endasm are supported. I haven't noticed yet support for __asm("string") in the parser yet.
Also in the lexer's code, the lexing type of the inline assembly token "blob" gets changed to CPP_ASM only if a property called preproc_asm is set to 0, as can be seen in sdcc/support/cpp/libcpp/lex.c:1900.
result->type = CPP_NAME;
{
struct normalize_state nst = INITIAL_NORMALIZE_STATE;
result->val.node.node = lex_identifier (pfile, buffer->cur - 1, false,
&nst);
warn_about_normalization (pfile, result, &nst);
}
/* SDCC _asm specific */
/* handle _asm ... _endasm ; */
if (result->val.node.node == pfile->spec_nodes.n__asm || result->val.node.node == pfile->spec_nodes.n__asm1)
{
if (CPP_OPTION (pfile, preproc_asm) == 0)
{
comment_start = buffer->cur;
result->type = CPP_ASM;
_sdcpp_skip_asm_block (pfile);
/* Save the _asm block as a token in its own right. */
_sdcpp_save_asm (pfile, result, comment_start, result->val.node.node == pfile->spec_nodes.n__asm);
}
result->flags |= ENTER_ASM;
}
else if (result->val.node.node == pfile->spec_nodes.n__endasm || result->val.node.node == pfile->spec_nodes.n__endasm1)
{
result->flags |= EXIT_ASM;
}
/* Convert named operators to their proper types. */
else if (result->val.node.node->flags & NODE_OPERATOR)
{
result->flags |= NAMED_OP;
result->type = (enum cpp_ttype) result->val.node.node->directive_index;
}
break;
The solution was to add #pragma preproc_asm - (or +) at the top of the file and to use the multiline __asm/__endasm blocks.

this link: http://www.crossware.com/smanuals/c8051/_t243.html
has this to say about inline assembly code
Assembler code can be embedded into your C source code in two ways:
using the #asm/#endasm preprocessor directives
using the _asm keyword
The pre-processor directives #asm and #endasm allow assembler code to be included anywhere within the C source code file, the only restriction being that it cannot be positioned within an expression. All lines between #asm and #endasm are passed straight through unmodified to the intermediate file processed by the assembler and so all of the rules for the cross assembler source code are supported.
The pre-processor directives #if, #ifdef, #ifndef, #else, #elif and #endif are valid between #asm and #endasm and so can be used to maintain the assembler code if required.
The _asm keyword can only be used within functions. It is used with following syntax:
_asm();
The string constant is passed straight through unmodified as a single line to the intermediate file processed by the assembler. Each should therefore be a valid line of assembler code.
One advantage of the _asm syntax is that it is subject to token replacement by the C preprocessor. Therefore the statement can be generated by a series of macros.
Also with the _asm syntax, the compiler supports a special construct to enable easy access to C variables. If the variable name is placed in the string contant within curly braces, the compiler replaces the variable name (and the curly braces) with the appropriate substring depending upon the location of the variable. See the following sections for more details.
The compiler generates upper case mnemonics and so if lower case is chosen for the in-line assembler code it can be clearly distinguished from the compiler generated code in the list file.
however, the correct format is: '_asm(" nop");' because a mnemonic assembly instruction cannot be the first thing on a line (that privilege is for labels)

Related

Define a unique and global assembly label/symbol inside C functions

I want to mark specific C lines with sort of assembler label/symbol which will not occupy any space in the binary but by examining the linker output map file I will know all occurrences of such generated labels and, eventually, of the C code that was "marked" this way. So I want to be able to define such labels, and to make them global, and used so the linker does not throw it away
I also need some macros magic to have those labels have a unique name each time the C code is preprocessed ( to make sure each inlined instance of the function has its own label - otherwise I will have duplicate symbols, I guess )
Example :
// my build system will pass -DMYFILE_ID for each file, here I am trying to create a unique literal for each inline instance of the function
#define UN(X) #X
#define UNIQUE(X,Y) UN(X##Y)
void my_func(void)
{
_asm("GLOBAL_LABEL_"UNIQUE(MYFILE_ID,__LINE__)":\n\t")
my_c_code_I_want_to_track();
}
And what I would like to have at the end, is in the linker output symbols map file, something like that
0xsome_address GLOBAL_LABEL_12_1
0xdifferent_address GLOBAL_LABEL_12_2
0xyeanotheraddress GLOBAL_LABEL_13_1
which basically should give me an idea at which addresses my_c_code_i_want_to_track got instantiated
The whole idea is sort of inspired by how the labels in assembly are actually "symbols" that have a placement and so their addresses can be checked but they dont actually occupy its own space.
Problems :
1. Is it even possible to have assembly labels be defined like that
2. How to make those labels stay and appear in the output symbols map file
3. Something is wrong with the UNIQUE macro as I get "label redefined" when trying to compile

You can use %= (e.g. label%=:) inside an Extended-asm template to get the compiler to generate a unique number to avoid name collisions when a function containing inline-asm is inlined multiple times in one compilation unit.
#define STRINGIFY(x) #x
#define STR(x) STRINGIFY(x)
int foo(int x) {
asm("marker" __FILE__ "_line" STR(__LINE__) "_uniqueid%=:" :::);
return x+1;
}
int caller1(int x) {
return foo(x);
}
int caller2(int x) {
return foo(x);
}
compiles to the following asm with gcc -O3 (on Godbolt):
foo(int):
marker/tmp/compiler-explorer-compiler11899-55-1ki0cth.pehm/example.cpp_line4_uniqueid7:
lea eax, [rdi+1]
ret
caller1(int):
marker/tmp/compiler-explorer-compiler11899-55-1ki0cth.pehm/example.cpp_line4_uniqueid22:
lea eax, [rdi+1]
ret
caller2(int):
marker/tmp/compiler-explorer-compiler11899-55-1ki0cth.pehm/example.cpp_line4_uniqueid41:
lea eax, [rdi+3]
ret
This of course won't assemble because / isn't a valid label character in GAS.
Using MYFILE_ID which contains only characters that can appear in symbol names, this would assemble just fine, and you should be able to see all the marker labels in nm output.

One problem is that you may get multiple copies of the same label due to inlining. Add the following attribute to functions containing these labels:
__attribute__((noinline))
Also note that you need to mark the symbol as global. Let's extract this into a macro so we can format nicely without changing the value of __LINE__:
#define MAKE_LABEL \
__asm__( \
"GLOBAL_LABEL_" UNIQUE(MYFILE_ID, __LINE__) ":" \
"\n\t.global GLOBAL_LABEL_" UNIQUE(MYFILE_ID, __LINE__) \
)
But the macro-expansion is off. Unfortunately, I cannot explain to you why this works. But here is the correct macro definition:
#define UN(X) #X
#define UNIQUE2(X,Y) UN(X##Y)
#define UNIQUE(X,Y) UNIQUE2(X,Y)
Otherwise you will get __LINE__ instead of, say, 23.

Will the compiler allocate any memory for code disabled by macro in C language?

For example:
int main()
{
fun();//calling a fun
}
void fun(void)
{
#if 0
int a = 4;
int b = 5;
#endif
}
What is the size of the fun() function? And what is the total memory will be created for main() function?

Compilation of a C source file is done in multiple phases. The phase where the preprocessor runs is done before the phase where the code is compiled.
The "compiler" will not even see code that the preprocessor has removed; from its point of view, the function is simply
void fun(void)
{
}
Now if the function will "create memory" depends on the compiler and its optimization. For a debug build the function will probably still exist and be called. For an optimized release build the compiler might not call or even keep (generate boilerplate code for) the function.

Compilation is split into 4 stages.
Preprocessing.
Compilation.
Assembler.
Linker
Compiler performs preprocessor directives before starting the actual compilation, and in this stage conditional inclusions are performed along with others.
The #if is a conditional inclusion directive.
From C11 draft 6.10.1-3:
Preprocessing directives of the forms
#if constant-expression new-line groupopt
#elif constant-expression new-line groupopt
check whether the controlling constant expression evaluates to nonzero.
As in your code #if 0 tries to evaluate to nonzero but remains false, thereby the code within the conditional block is excluded.
The preprocessing stage can be output to stdout with -E option:
gcc -E filename.c
from the command above the output will give,
# 943 "/usr/include/stdio.h" 3 4
# 2 "filename.c" 2
void fun(void)
{
}
int main()
{
fun();
return 0;
}
As we can see the statements with the #if condition are removed during the preprocessing stage.
This directive can be used to avoid compilation of certain code block.
Now to see if there is any memory allocated by the compiler for an empty function,
filename.c:
void fun(void)
{
}
int main()
{
fun();
return 0;
}
The size command gives,
$ size a.out
text data bss dec hex filename
1171 552 8 1731 6c3 a.out
and for the code,
filename.c:
void fun(void)
{
#if 0
int a = 4;
int b = 5;
#endif
}
int main()
{
fun();
return 0;
}
The output of size command for the above code is,
$ size a.out
text data bss dec hex filename
1171 552 8 1731 6c3 a.out
As seen in both cases memory allocated is same by which can conclude that the compiler does not allocate memory for the block of code disabled by macro.

According to Gcc reference:
The simplest sort of conditional is
#ifdef MACRO
controlled text
#endif /* MACRO */
This block is called a conditional group. controlled text will be
included in the output of the preprocessor if and only if MACRO is
defined. We say that the conditional succeeds if MACRO is defined,
fails if it is not.
The controlled text inside of a conditional can include preprocessing
directives. They are executed only if the conditional succeeds. You
can nest conditional groups inside other conditional groups, but they
must be completely nested. In other words, ‘#endif’ always matches the
nearest ‘#ifdef’ (or ‘#ifndef’, or ‘#if’). Also, you cannot start a
conditional group in one file and end it in another.
Even if a conditional fails, the controlled text inside it is still
run through initial transformations and tokenization. Therefore, it
must all be lexically valid C. Normally the only way this matters is
that all comments and string literals inside a failing conditional
group must still be properly ended.
The comment following the ‘#endif’ is not required, but it is a good
practice if there is a lot of controlled text, because it helps people
match the ‘#endif’ to the corresponding ‘#ifdef’. Older programs
sometimes put MACRO directly after the ‘#endif’ without enclosing it
in a comment. This is invalid code according to the C standard. CPP
accepts it with a warning. It never affects which ‘#ifndef’ the
‘#endif’ matches.
Sometimes you wish to use some code if a macro is not defined. You can
do this by writing ‘#ifndef’ instead of ‘#ifdef’. One common use of
‘#ifndef’ is to include code only the first time a header file is
included.

Does __asm{}; return the value of eax?

Simple question. The function asm in C is used to do inline assembly in your code. But what does it return? Is it the conventional eax, and if not, what does it return?

__asm__ itself does not return a value. C standard does not define how __asm__ should handle the return value, so the behavior might be different between compilers. You stated that Visual Studio example is valid, but Visual Studio uses __asm. __asm__ is used at least by GCC.
Visual Studio
To get the result in a C program, you can place return value to eax in the assembly code, and return from the function. The caller will receive contents of eax as the return value. This is supported even with optimization enabled, even if the compiler decides to inline the function containing the __asm{} block.
It avoids a store/reload you'd otherwise get from moving the value to a C variable in the asm and returning that C variable, because MSVC inline asm syntax doesn't support inputs/outputs in registers (except for this return-value case).
Visual Studio 2015 documentation:
int power2( int num, int power )
{
__asm
{
mov eax, num ; Get first argument
mov ecx, power ; Get second argument
shl eax, cl ; EAX = EAX * ( 2 to the power of CL )
}
// Return with result in EAX
// by falling off the end of a non-void function
}
clang -fasm-blocks supports the same inline-asm syntax but does not support falling off the end of a non-void function as returning the value that an asm{} block left in EAX/RAX. Beware of that if porting MSVC inline asm to clang. It will break horribly when compiled with optimization enabled (function inlining).
GCC
GCC inline assembly HOWTO does not contain a similar example. You can't use an implicit return as in Visual Studio, but fortunately you don't need to because GNU C inline asm syntax allows specifying outputs in registers. No hack is needed to avoid a store/reload of an output value.
The HOWTO shows that you can store the result to C variable inside the assembly block, and return value of that variable after the assembly block has ended. You can even use "=r"(var) to let the compiler pick its choice of register, in case EAX isn't the most convenient after inlining.
An example of an (inefficient) string copy function, returning value of dest:
static inline char * strcpy(char * dest,const char *src)
{
int d0, d1, d2;
__asm__ __volatile__( "1:\tlodsb\n\t"
"stosb\n\t"
"testb %%al,%%al\n\t"
"jne 1b"
: "=&S" (d0), "=&D" (d1), "=&a" (d2)
: "0" (src),"1" (dest)
: "memory");
return dest;
}
(Note that dest isn't actually an output from the inline asm statement. The matching constraint for the dummy output operands tells the compiler the inline asm destroyed that copy of the variable so it needs to preserve it across the asm statement on its own somehow.)
If you omit a return statement in a non-void function with optimization enabled, you get a warning like warning: no return statement in function returning non-void [-Wreturn-type] and recent GCC/clang won't even emit a ret; it assumes this path of execution is never taken (because that would be UB). It doesn't matter whether or not the function contained an asm statement or not.

It's unlikely; per the C99 spec, under J3 Implementation-defined behaviour:
The asm keyword may be used to insert assembly language directly into
the translator output (6.8). The most common implementation is via a statement of the form:
asm ( character-string-literal );
So it's unlikely that an implementor is going to come up with an approach that both inserts the assembly language into the translator output and also generates some additional intermediary linking code to wire a particular register as a return result.
It's a keyword, not a function.
E.g. GCC uses "=r"-type constraint semantics to allow you in your assembly to have write access to a variable. But you ensure the result ends up in the right place.

Inline assembler: Pass a constant

I have the following problem: I want to use the following assembler code from my C source files using inline assembler:
.word 1
The closest I've gotten is using this inline assembler code:
asm(".word %0\n": : "i"(1));
However, this results in the following code in the generated assembler file:
.word #1
So I need a way to pass a constant that is known at compile time without adding the '#' in front of it. Is this possible using inline assembler?
Edit:
To make it more clear why I need this, this is how it will be used:
#define LABELS_PUT(b) asm(".word %0\n": : "i"((b)));
int func(void) {
LABELS_PUT(1 + 2);
return 0;
}
I can't use ".word 1" because the value will be different every time the macro LABELS_PUT is called.

Your macro has a ; at the end. So it's a whole statement, not just an expression. Don't do that.
A .word mixed in with the code of your function is usually going to be an illegal instruction, isn't it? Are you actually planning to run this binary?
You should be able to get the preprocessor to stringify your macro parameter, and let string-concatenation join it up. Then the assembler can evaluate 1+2.
#define LABELS_PUT(b) asm(".word " #b "\n")
LABELS_PUT(1+2); // becomes:
asm(".word " "1+2" "\n");
There's also https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#x86Operandmodifiers, some of which might work for other architectures:
asm (".word %c0" : : "i" (b))

In GCC you can do it like this:
asm __volatile__ (".word 0x1");
If you are using Visual Studio then you can try:
some_word: _asm{_emit 1000b}
It will pass into code word constant 1000b
You can get access with label some_word

C; Inline assembly syntax mistake "Expected string literal before numerical constant"

When I compile the following example code (these are essentially junk assembly statements with no real purpose) I get the following error;
def-asm-pop.c:13:3: error: expected string literal before numeric
constant
Line 13 is the uncommented "ASM" line;
#define iMOV "mov %eax,%ebx\n\t"
#define iNOP "nop\n\t"
#define iASM __asm__(iMOV iNOP)
#define MOV 0xB8
#define NOP 0x90
#define ASM __asm__(MOV NOP)
int main() {
//iASM; /* This one works when uncommented */
ASM; /* The one causes the error when uncommented */
return 0;
}
There maybe an error in my Hello World style attempt at inline assembly, but that is another stepping stone for me to overcome. At this point in time it seems I can't define a list of opcodes and then define an assembly statement list built from them, in the same way I can by defining the text commands. How can I make ASM work like the iASM statement?

As the error message states, the __asm__ operator wants a string and not a number, and in that string it wants valid assembler.
You are trying to directly write binary opcodes, this has not much to do with assembler.

This might work:
#define MOV ".byte 0xB8\n"
#define NOP ".byte 0x90\n"
The exact syntax is of course dependent on your assembler (and the appropriate machine language is dependent on your target platform). This is not much use for anything other than experimenting; it is not a good way to write code.