C; Inline assembly syntax mistake "Expected string literal before numerical constant" - c

When I compile the following example code (these are essentially junk assembly statements with no real purpose) I get the following error;
def-asm-pop.c:13:3: error: expected string literal before numeric
constant
Line 13 is the uncommented "ASM" line;
#define iMOV "mov %eax,%ebx\n\t"
#define iNOP "nop\n\t"
#define iASM __asm__(iMOV iNOP)
#define MOV 0xB8
#define NOP 0x90
#define ASM __asm__(MOV NOP)
int main() {
//iASM; /* This one works when uncommented */
ASM; /* The one causes the error when uncommented */
return 0;
}
There maybe an error in my Hello World style attempt at inline assembly, but that is another stepping stone for me to overcome. At this point in time it seems I can't define a list of opcodes and then define an assembly statement list built from them, in the same way I can by defining the text commands. How can I make ASM work like the iASM statement?

As the error message states, the __asm__ operator wants a string and not a number, and in that string it wants valid assembler.
You are trying to directly write binary opcodes, this has not much to do with assembler.

This might work:
#define MOV ".byte 0xB8\n"
#define NOP ".byte 0x90\n"
The exact syntax is of course dependent on your assembler (and the appropriate machine language is dependent on your target platform). This is not much use for anything other than experimenting; it is not a good way to write code.

Related

Define a unique and global assembly label/symbol inside C functions

I want to mark specific C lines with sort of assembler label/symbol which will not occupy any space in the binary but by examining the linker output map file I will know all occurrences of such generated labels and, eventually, of the C code that was "marked" this way. So I want to be able to define such labels, and to make them global, and used so the linker does not throw it away
I also need some macros magic to have those labels have a unique name each time the C code is preprocessed ( to make sure each inlined instance of the function has its own label - otherwise I will have duplicate symbols, I guess )
Example :
// my build system will pass -DMYFILE_ID for each file, here I am trying to create a unique literal for each inline instance of the function
#define UN(X) #X
#define UNIQUE(X,Y) UN(X##Y)
void my_func(void)
{
_asm("GLOBAL_LABEL_"UNIQUE(MYFILE_ID,__LINE__)":\n\t")
my_c_code_I_want_to_track();
}
And what I would like to have at the end, is in the linker output symbols map file, something like that
0xsome_address GLOBAL_LABEL_12_1
0xdifferent_address GLOBAL_LABEL_12_2
0xyeanotheraddress GLOBAL_LABEL_13_1
which basically should give me an idea at which addresses my_c_code_i_want_to_track got instantiated
The whole idea is sort of inspired by how the labels in assembly are actually "symbols" that have a placement and so their addresses can be checked but they dont actually occupy its own space.
Problems :
1. Is it even possible to have assembly labels be defined like that
2. How to make those labels stay and appear in the output symbols map file
3. Something is wrong with the UNIQUE macro as I get "label redefined" when trying to compile
You can use %= (e.g. label%=:) inside an Extended-asm template to get the compiler to generate a unique number to avoid name collisions when a function containing inline-asm is inlined multiple times in one compilation unit.
#define STRINGIFY(x) #x
#define STR(x) STRINGIFY(x)
int foo(int x) {
asm("marker" __FILE__ "_line" STR(__LINE__) "_uniqueid%=:" :::);
return x+1;
}
int caller1(int x) {
return foo(x);
}
int caller2(int x) {
return foo(x);
}
compiles to the following asm with gcc -O3 (on Godbolt):
foo(int):
marker/tmp/compiler-explorer-compiler11899-55-1ki0cth.pehm/example.cpp_line4_uniqueid7:
lea eax, [rdi+1]
ret
caller1(int):
marker/tmp/compiler-explorer-compiler11899-55-1ki0cth.pehm/example.cpp_line4_uniqueid22:
lea eax, [rdi+1]
ret
caller2(int):
marker/tmp/compiler-explorer-compiler11899-55-1ki0cth.pehm/example.cpp_line4_uniqueid41:
lea eax, [rdi+3]
ret
This of course won't assemble because / isn't a valid label character in GAS.
Using MYFILE_ID which contains only characters that can appear in symbol names, this would assemble just fine, and you should be able to see all the marker labels in nm output.
One problem is that you may get multiple copies of the same label due to inlining. Add the following attribute to functions containing these labels:
__attribute__((noinline))
Also note that you need to mark the symbol as global. Let's extract this into a macro so we can format nicely without changing the value of __LINE__:
#define MAKE_LABEL \
__asm__( \
"GLOBAL_LABEL_" UNIQUE(MYFILE_ID, __LINE__) ":" \
"\n\t.global GLOBAL_LABEL_" UNIQUE(MYFILE_ID, __LINE__) \
)
But the macro-expansion is off. Unfortunately, I cannot explain to you why this works. But here is the correct macro definition:
#define UN(X) #X
#define UNIQUE2(X,Y) UN(X##Y)
#define UNIQUE(X,Y) UNIQUE2(X,Y)
Otherwise you will get __LINE__ instead of, say, 23.

AVR GCC, assembly C stub functions, eor and the required constant value

I'm having this code:
uint16_t swap_bytes(uint16_t x)
{
asm volatile(
"eor, %A0, %B0" "\n\t"
"eor, %B0, %A0" "\n\t"
"eor, %A0, %B0" "\n\t"
: "=r" (x)
: "0" (x)
);
return x;
}
Which translates (by avr-gcc version 4.8.1 with -std=gnu99 -save-temps) to:
.global swap_bytes
.type swap_bytes, #function
swap_bytes:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
/* #APP */
; 43 "..\lib\own\ownlib.c" 1
eor, r24, r25
eor, r25, r24
eor, r24, r25
; 0 "" 2
/* #NOAPP */
ret
.size swap_bytes, .-swap_bytes
But then the compiler is complaining like that:
|65|Error: constant value required|
|65|Error: garbage at end of line|
|66|Error: constant value required|
|66|Error: garbage at end of line|
|67|Error: constant value required|
|67|Error: garbage at end of line|
||=== Build failed: 6 error(s), 0 warning(s) (0 minute(s), 0 second(s)) ===|
The mentioned lines are the ones with the eor commands. Why does the compiler having problems with that? The registers are even upper (>= r16) ones where nearly all operations are possible. constant value required sounds to me like it expects a literal... I dont get it.
Just to clarify for future googlers:
eor, r24, r25
has an extra comma after the eor. This should be written as:
eor r24, r25
I would also encourage you (again) to consider using gcc's __builtin_bswap16. In case you are not familiar with the gcc 'builtin' functions, they are functions that are built into the compiler, and (despite looking like functions) are typically inlined. They have been written and optimized by people who understand all the ins and outs of the various processors and can take into account things you may not have considered.
I understand the desire to keep code as small as possible. And I accept that it is possible that (somehow) this builtin on your specific processor is producing sub-optimal code (I assume you have checked?). On the other hand, it may produce exactly the same code. Or it may use some even more clever trick to do this. Or it might interleave instructions from the surrounding code to take advantage of pipelining (or some other avr-specific thing that I have never heard of because I don't speak 'avr').
What's more, consider this code:
int main()
{
return __builtin_bswap16(12345);
}
Your code always takes 3 instructions to process a swap. However with builtins, the compiler can recognize that the arg is constant and compute the value at compile time instead of at run time. Hard to be more efficient than that.
I could also point out the benefits of "easier to support." Writing inline asm is HARD to do correctly. And future maintainers hate to touch it cuz they're never quite sure how it works. And of course, the builtin is going to be more cross-platform portable.
Still not convinced? My last pitch: Even after you fix the commas, your inline asm code still isn't quite right. Consider this code:
int main(int argc, char *argv[])
{
return swap_bytes(argc) + swap_bytes(argc);
}
Because of the way you have written written swap_bytes (ie using volatile), gcc must compute the value twice (see the definition of volatile). Had you omitted volatile (or if you had used the builtin which does this correctly), it would have realized that argc doesn't change and re-used the output from the first call. Did I mention that correctly writing inline asm is HARD?
I don't know your code, constraints, level of expertise or requirements. Maybe your solution really is the best. The most I can do is to encourage you to think long and hard before using inline asm in production code.

Inline assembler: Pass a constant

I have the following problem: I want to use the following assembler code from my C source files using inline assembler:
.word 1
The closest I've gotten is using this inline assembler code:
asm(".word %0\n": : "i"(1));
However, this results in the following code in the generated assembler file:
.word #1
So I need a way to pass a constant that is known at compile time without adding the '#' in front of it. Is this possible using inline assembler?
Edit:
To make it more clear why I need this, this is how it will be used:
#define LABELS_PUT(b) asm(".word %0\n": : "i"((b)));
int func(void) {
LABELS_PUT(1 + 2);
return 0;
}
I can't use ".word 1" because the value will be different every time the macro LABELS_PUT is called.
Your macro has a ; at the end. So it's a whole statement, not just an expression. Don't do that.
A .word mixed in with the code of your function is usually going to be an illegal instruction, isn't it? Are you actually planning to run this binary?
You should be able to get the preprocessor to stringify your macro parameter, and let string-concatenation join it up. Then the assembler can evaluate 1+2.
#define LABELS_PUT(b) asm(".word " #b "\n")
LABELS_PUT(1+2); // becomes:
asm(".word " "1+2" "\n");
There's also https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#x86Operandmodifiers, some of which might work for other architectures:
asm (".word %c0" : : "i" (b))
In GCC you can do it like this:
asm __volatile__ (".word 0x1");
If you are using Visual Studio then you can try:
some_word: _asm{_emit 1000b}
It will pass into code word constant 1000b
You can get access with label some_word

sdcc inline asm() not working

I'm using GCC (correction) SDCC with the Eclipse IDE to compile C code for an 8051 architecture embedded target. I need to insert a few NOPs for timing, and I can't get the compiler to accept inline assembly code.
With __asm__ ("; This is a comment\nlabel:\n\tnop"); (as suggested below) or variations I get warning 112: function '__asm__' implicit declaration and then error 101: too many parameters, as if I'm trying to call an undeclared function. I've tried all other options in the SDCC manual section 3.14 also. __asm ... __endasm gives a syntax error on __asm, same with a single underbar, and combinations of whitespace, newlines, or the same line don't help.
If I'm piecing together the command line from the Makefile correctly (without the #include path), the CFLAGS on the SDCC command line are:
-Wp,-MD,$(#:%.rel=%.d),-MT,$#,-MP --disable-warning 110 -Wa,-p --model-medium
Moved from comment
In the sources of SDCC 3.1.0's lexer, I see that both _asm/_endasm and __asm/__endasm are supported. I haven't noticed yet support for __asm("string") in the parser yet.
Also in the lexer's code, the lexing type of the inline assembly token "blob" gets changed to CPP_ASM only if a property called preproc_asm is set to 0, as can be seen in sdcc/support/cpp/libcpp/lex.c:1900.
result->type = CPP_NAME;
{
struct normalize_state nst = INITIAL_NORMALIZE_STATE;
result->val.node.node = lex_identifier (pfile, buffer->cur - 1, false,
&nst);
warn_about_normalization (pfile, result, &nst);
}
/* SDCC _asm specific */
/* handle _asm ... _endasm ; */
if (result->val.node.node == pfile->spec_nodes.n__asm || result->val.node.node == pfile->spec_nodes.n__asm1)
{
if (CPP_OPTION (pfile, preproc_asm) == 0)
{
comment_start = buffer->cur;
result->type = CPP_ASM;
_sdcpp_skip_asm_block (pfile);
/* Save the _asm block as a token in its own right. */
_sdcpp_save_asm (pfile, result, comment_start, result->val.node.node == pfile->spec_nodes.n__asm);
}
result->flags |= ENTER_ASM;
}
else if (result->val.node.node == pfile->spec_nodes.n__endasm || result->val.node.node == pfile->spec_nodes.n__endasm1)
{
result->flags |= EXIT_ASM;
}
/* Convert named operators to their proper types. */
else if (result->val.node.node->flags & NODE_OPERATOR)
{
result->flags |= NAMED_OP;
result->type = (enum cpp_ttype) result->val.node.node->directive_index;
}
break;
The solution was to add #pragma preproc_asm - (or +) at the top of the file and to use the multiline __asm/__endasm blocks.
this link: http://www.crossware.com/smanuals/c8051/_t243.html
has this to say about inline assembly code
Assembler code can be embedded into your C source code in two ways:
using the #asm/#endasm preprocessor directives
using the _asm keyword
The pre-processor directives #asm and #endasm allow assembler code to be included anywhere within the C source code file, the only restriction being that it cannot be positioned within an expression. All lines between #asm and #endasm are passed straight through unmodified to the intermediate file processed by the assembler and so all of the rules for the cross assembler source code are supported.
The pre-processor directives #if, #ifdef, #ifndef, #else, #elif and #endif are valid between #asm and #endasm and so can be used to maintain the assembler code if required.
The _asm keyword can only be used within functions. It is used with following syntax:
_asm();
The string constant is passed straight through unmodified as a single line to the intermediate file processed by the assembler. Each should therefore be a valid line of assembler code.
One advantage of the _asm syntax is that it is subject to token replacement by the C preprocessor. Therefore the statement can be generated by a series of macros.
Also with the _asm syntax, the compiler supports a special construct to enable easy access to C variables. If the variable name is placed in the string contant within curly braces, the compiler replaces the variable name (and the curly braces) with the appropriate substring depending upon the location of the variable. See the following sections for more details.
The compiler generates upper case mnemonics and so if lower case is chosen for the in-line assembler code it can be clearly distinguished from the compiler generated code in the list file.
however, the correct format is: '_asm(" nop");' because a mnemonic assembly instruction cannot be the first thing on a line (that privilege is for labels)

GCC inline - push address, not its value to stack

I'm experimenting with GCC's inline assembler (I use MinGW, my OS is Win7).
Right now I'm only getting some basic C stdlib functions to work. I'm generally familiar with the Intel syntax, but new to AT&T.
The following code works nice:
char localmsg[] = "my local message";
asm("leal %0, %%eax" : "=m" (localmsg));
asm("push %eax");
asm("call %0" : : "m" (puts));
asm("add $4,%esp");
That LEA seems redundant, however, as I can just push the value straight onto the stack. Well, due to what I believe is an AT&T peculiarity, doing this:
asm("push %0" : "=m" (localmsg));
will generate the following assembly code in the final executable:
PUSH DWORD PTR SS:[ESP+1F]
So instead of pushing the address to my string, its contents were pushed because the "pointer" was "dereferenced", in C terms. This obviously leads to a crash.
I believe this is just GAS's normal behavior, but I was unable to find any information on how to overcome this. I'd appreciate any help.
P.S. I know this is a trivial question to those who are experienced in the matter. I expect to be downvoted, but I've just spent 45 minutes looking for a solution and found nothing.
P.P.S. I realize the proper way to do this would be to call puts( ) in the C code. This is for purely educational/experimental reasons.
While inline asm is always a bit tricky, calling functions from it is particularly challenging. Not something I would suggest for a "getting to known inline asm" project. If you haven't already, I suggest looking through the very latest inline asm docs. A lot of work has been done to try to explain how inline asm works.
That said, here are some thoughts:
1) Using multiple asm statements like this is a bad idea. As the docs say: Do not expect a sequence of asm statements to remain perfectly consecutive after compilation. If certain instructions need to remain consecutive in the output, put them in a single multi-instruction asm statement.
2) Directly modifying registers (like you are doing with eax) without letting gcc know you are doing so is also a bad idea. You should either use register constraints (so gcc can pick its own registers) or clobbers to let gcc know you are stomping on them.
3) When a function (like puts) is called, while some registers must have their values restored before returning, some registers can be treated as scratch registers by the called function (ie modified and not restored before returning). As I mentioned in #2, having your asm modify registers without informing gcc is a very bad idea. If you know the ABI for the function you are calling, you can add its scratch registers to the asm's clobber list.
4) While in this specific example you are using a constant string, as a general rule, when passing asm pointers to strings, structs, arrays, etc, you are likely to need the "memory" clobber to ensure that any pending writes to memory are performed before starting to execute your asm.
5) Actually, the lea is doing something very important. The value of esp is not known at compile time, so it's not like you can perform push $12345. Someone needs to compute (esp + the offset of localmsg) before it can be pushed on the stack. Also, see second example below.
6) If you prefer intel format (and what right-thinking person wouldn't?), you can use -masm=intel.
Given all this, my first cut at this code looks like this. Note that this does NOT clobber puts' scratch registers. That's left as an exercise...
#include <stdio.h>
int main()
{
const char localmsg[] = "my local message";
int result;
/* Use 'volatile' since 'result' is usually not going to get used,
which might tempt gcc to discard this asm statement as unneeded. */
asm volatile ("push %[msg] \n\t" /* Push the address of the string. */
"call %[puts] \n \t" /* Call the print function. */
"add $4,%%esp" /* Clean up the stack. */
: "=a" (result) /* The result code from puts. */
: [puts] "m" (puts), [msg] "r" (localmsg)
: "memory", "esp");
printf("%d\n", result);
}
True this doesn't avoid the lea due to #5. However, if that's really important, try this:
#include <stdio.h>
const char localmsg[] = "my local message";
int main()
{
int result;
/* Use 'volatile' since 'result' is usually not going to get used. */
asm volatile ("push %[msg] \n\t" /* Push the address of the string. */
"call %[puts] \n \t" /* Call the print function. */
"add $4,%%esp" /* Clean up the stack. */
: "=a" (result) /* The result code. */
: [puts] "m" (puts), [msg] "i" (localmsg)
: "memory", "esp");
printf("%d\n", result);
}
As a global, the address of localmsg is now knowable at compile time (ok, I'm simplifying a bit), the asm produced looks like this:
push $__ZL8localmsg
call _puts
add $4,%esp
Tada.

Resources