This function "strcpy" aims to copy the content of src to dest, and it works out just fine: display two lines of "Hello_src".
#include <stdio.h>
static inline char * strcpy(char * dest,const char *src)
{
int d0, d1, d2;
__asm__ __volatile__("1:\tlodsb\n\t"
"stosb\n\t"
"testb %%al,%%al\n\t"
"jne 1b"
: "=&S" (d0), "=&D" (d1), "=&a" (d2)
: "0"(src),"1"(dest)
: "memory");
return dest;
}
int main(void) {
char src_main[] = "Hello_src";
char dest_main[] = "Hello_des";
strcpy(dest_main, src_main);
puts(src_main);
puts(dest_main);
return 0;
}
I tried to change the line : "0"(src),"1"(dest) to : "S"(src),"D"(dest), the error occurred: ‘asm’ operand has impossible constraints. I just cannot understand. I thought that "0"/"1" here specified the same constraint as the 0th/1th output variable. the constraint of 0th output is =&S, te constraint of 1th output is =&D. If I change 0-->S, 1-->D, there shouldn't be any wrong. What's the matter with it?
Does "clobbered registers" or the earlyclobber operand(&) have any use? I try to remove "&" or "memory", the result of either circumstance is the same as the original one: output two lines of "Hello_src" strings. So why should I use the "clobbered" things?
The earlyclobber & means that the particular output is written before the inputs are consumed. As such, the compiler may not allocate any input to the same register. Apparently using the 0/1 style overrides that behavior.
Of course the clobber list also has important use. The compiler does not parse your assembly code. It needs the clobber list to figure out which registers your code will modify. You'd better not lie, or subtle bugs may creep in. If you want to see its effect, try to trick the compiler into using a register around your asm block:
extern int foo();
int bar()
{
int x = foo();
asm("nop" ::: "eax");
return x;
}
Relevant part of the generated assembly code:
call foo
movl %eax, %edx
nop
movl %edx, %eax
Notice how the compiler had to save the return value from foo into edx because it believed that eax will be modified. Normally it would just leave it in eax, since that's where it will be needed later. Here you can imagine what would happen if your asm code did modify eax without telling the compiler: the return value would be overwritten.
Related
I'm having trouble solving a school exercise , I'm supposed to change a char array in c using inline assembly. In this case change "ahoy" to "aXoy", but I'm getting segmentation fault. This is my code:
#include <stdio.h>
int main() {
char string[] = "ahoy";
__asm__ volatile (
"mov %0, %%eax;"
"movb $'X', 1(%%eax);"
: "=m"(string) : "0"(string) : "memory", "eax");
printf("%s\n", string);
return 0
}
with this: "mov %0, %%eax;" I'm trying to store address of the array in register
then with this: "movb $'X', 1(%%eax);" I want to store byte 'X' in location pointed to by (%%eax) offset by 1 byte(char),
I have string both as output and input, and "memory","eax" in clobber since I'm modifying both. What is wrong with my code?
Use gcc -S instead of -c to look at the compiler's assembly output and you should see what's wrong. The "m" constraint produces a memory reference expression for accessing the object associated with it, not a register containing its address. So it will expand to something like (%ecx), not %ecx, and the first mov will load 4 bytes from string, rather than the address of string, into eax.
One way to fix this would be to use a register constraint "r"(string) instead.
Alternatively you could replace the mov with lea: lea %0, %%eax.
There are other issues with your code too like the useless temporary/clobber register eax but they shouldn't keep it from working.
I am writing Inline assembly for the first time and I don't know why I'm getting a Seg fault when I try to run it.
#include <stdio.h>
int very_fast_function(int i){
asm volatile("movl %%eax,%%ebx;"
"sall $6,%%ebx;"
"addl $1,%%ebx;"
"cmpl $1024,%%ebx;"
"jle Return;"
"addl $1,%%eax;"
"jmp End;"
"Return: movl $0,%%eax;"
"End: ret;": "=eax" (i) : "eax" (i) : "eax", "ebx" );
return i;
/*if ( (i*64 +1) > 1024) return ++i;
else return 0;*/
}
int main(int argc, char *argv[])
{
int i;
i=40;
printf("The function value of i is %d\n", very_fast_function(i));
return 0;
}
Like I said this is my first time so if it's super obvious I apologize.
You shall not use ret directly. Reason: there're initialization like push the stack or save the frame pointer when entering each function, also there're corresponding finalization. You just leave the stack not restored if use ret directly.
Just remove ret and there shall not be segmentation fault.
However I suppose the result is not as expected. The reason is your input/output constrains are not as expected. Please notice "=eax" (i) you write does not specify to use %%eax as the output of i, while it means to apply constraint e a and x on output variable i.
For your purpose you could simply use r to specify a register. See this edited code which I've just tested:
asm volatile("movl %1,%%ebx;"
"sall $6,%%ebx;"
"addl $1,%%ebx;"
"cmpl $1024,%%ebx;"
"jle Return;"
"addl $1,%0;"
"jmp End;"
"Return: movl $0,%0;"
"End: ;": "=r" (i) : "r" (i) : "ebx" );
Here To use %%eax explicitly, use "=a" instead of "=r".
For further information, please read this http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html
ret should not be used in inline assembly blocks - the function you're in needs some cleanup beyond what a simple ret will handle.
Remember, inline assembly is inserted directly into the function it's embedded in. It's not a function unto itself.
I've seen the post about the same error but i'm still get error :
too many memory references for `mov'
junk `hCPUIDmov buffer' after expression
... here's the code (mingw compiler / C::B) :
#include iostream
using namespace std;
union aregister
{
int theint;
unsigned bits[32];
};
union tonibbles
{
int integer;
short parts[2];
};
void GetSerial()
{
int part1,part2,part3;
aregister issupported;
int buffer;
__asm(
"mov %eax, 01h"
"CPUID"
"mov buffer, edx"
);//do the cpuid, move the edx (feature set register) to "buffer"
issupported.theint = buffer;
if(issupported.bits[18])//it is supported
{
__asm(
"mov part1, eax"
"mov %eax, 03h"
"CPUID"
);//move the first part into "part1" and call cpuid with the next subfunction to get
//the next 64 bits
__asm(
"mov part2, edx"
"mov part3, ecx"
);//now we have all the 96 bits of the serial number
tonibbles serial[3];//to split it up into two nibbles
serial[0].integer = part1;//first part
serial[1].integer = part2;//second
serial[2].integer = part3;//third
}
}
Your assembly code is not correctly formatted for gcc.
Firstly, gcc uses AT&T syntax (EDIT: by default, thanks nrz), so it needs a % added for each register reference and a $ for immediate operands. The destination operand is always on the right side.
Secondly, you'll need to pass a line separator (for example \n\t) for a new line. Since gcc passes your string straight to the assembler, it requires a particular syntax.
You should usually try hard to minimize your assembler since it may cause problems for the optimizer. Simplest way to minimize the assembler required would probably be to break the cpuid instruction out into a function, and reuse that.
void cpuid(int32_t *peax, int32_t *pebx, int32_t *pecx, int32_t *pedx)
{
__asm(
"CPUID"
/* All outputs (eax, ebx, ecx, edx) */
: "=a"(*peax), "=b"(*pebx), "=c"(*pecx), "=d"(*pedx)
/* All inputs (eax) */
: "a"(*peax)
);
}
Then just simply call using;
int a=1, b, c, d;
cpuid(&a, &b, &c, &d);
Another possibly more elegant way is to do it using macros.
Because of how C works,
__asm(
"mov %eax, 01h"
"CPUID"
"mov buffer, edx"
);
is equivalent to
__asm("mov %eax, 01h" "CPUID" "mov buffer, edx");
which is equivalent to
__asm("mov %eax, 01hCPUIDmov buffer, edx");
which isn't what you want.
AT&T syntax (GAS's default) puts the destination register at the end.
AT&T syntax requires immediates to be prefixed with $.
You can't reference local variables like that; you need to pass them in as operands.
Wikipedia's article gives a working example that returns eax.
The following snippet might cover your use-cases (I'm not intricately familiar with GCC inline assembly or CPUID):
int eax, ebx, ecx, edx;
eax = 1;
__asm( "cpuid"
: "+a" (eax), "+b" (ebx), "+c" (ecx), "+d" (edx));
buffer = edx
Somebody over at SO posted a question asking how he could "hide" a function. This was my answer:
#include <stdio.h>
#include <stdlib.h>
int encrypt(void)
{
char *text="Hello World";
asm("push text");
asm("call printf");
return 0;
}
int main(int argc, char *argv[])
{
volatile unsigned char *i=encrypt;
while(*i!=0x00)
*i++^=0xBE;
return EXIT_SUCCESS;
}
but, there are problems:
encode.c: In function `main':
encode.c:13: warning: initialization from incompatible pointer type
C:\DOCUME~1\Aviral\LOCALS~1\Temp/ccYaOZhn.o:encode.c:(.text+0xf): undefined reference to `text'
C:\DOCUME~1\Aviral\LOCALS~1\Temp/ccYaOZhn.o:encode.c:(.text+0x14): undefined reference to `printf'
collect2: ld returned 1 exit status
My first question is why is the inline assembly failing ... what would be the right way to do it? Other thing -- the code for "ret" or "retn" is 0x00 , right... my code xor's stuff until it reaches a return ... so why is it SEGFAULTing?
As a high level point, I'm not quite sure why you're trying to use inline assembly to do a simple call into printf, as all you've done is create an incorrect version of a function call (your inline pushes something onto the stack, but never pop it off, most likely causing problems cause GCC isn't aware that you've modified the stack pointer in the middle of the function. This is fine in a trivial example, but could lead to non-obvious errors in a more complicated function)
Here's a correct implementation of your top function:
int encrypt(void)
{
char *text="Hello World";
char *formatString = "%s\n";
// volatile really isn't necessary but I just use it by habit
asm volatile("pushl %0;\n\t"
"pushl %1;\n\t"
"call printf;\n\t"
"addl $0x8, %%esp\n\t"
:
: "r"(text), "r"(formatString)
);
return 0;
}
As for your last question, the usual opcode for RET is "C3", but there are many variations, have a look at http://pdos.csail.mit.edu/6.828/2009/readings/i386/RET.htm
Your idea of searching for RET is also faulty as due to the fact that when you see the byte 0xC3 in a random set of instructions, it does NOT mean you've encountered a ret. As the 0xC3 may simply be the data/attributes of another instruction (as a side note, it's particularly hard to try and parse x86 instructions as you're doing due to the fact x86 is a CISC architecture with instruction lengths between 1-16 bytes)
As another note, not all OS's allow modification to the text/code segment (Where executable instructions are stored), so the the code you have in main may not work regardless.
GCC inline asm uses AT&T syntax (if no specific options are selected for using Intel's one).
Here's an example:
int a=10, b;
asm ("movl %1, %%eax;
movl %%eax, %0;"
:"=r"(b) /* output */
:"r"(a) /* input */
:"%eax" /* clobbered register */
);
Thus, your problem is that "text" is not identifiable from your call (and following instruction too).
See here for reference.
Moreover your code is not portable between 32 and 64 bit environments. Compile it with -m32 flag to ensure proper analysis (GCC will complain anyway if you fall in error).
A complete solution to your problem is on this post on GCC Mailing list.
Here's a snippet:
for ( i = method->args_size - 1; i >= 0; i-- ) {
asm( "pushl %0": /* no outputs */: \
"g" (stack_frame->op_stack[i]) );
}
asm( "call *%0" : /* no outputs */ : "g" (fp) :
"%eax", "%ecx", "%edx", "%cc", "memory" );
asm ( "movl %%eax, %0" : "=g" (ret_value) : /* No inputs */ );
On windows systems there's also an additional asm ( "addl %0, %%esp" : /* No outputs */ : "g" (method->args_size * 4) ); to do. Google for better details.
It is not printf but _printf
My Code
const int howmany = 5046;
char buffer[howmany];
asm("lea buffer,%esi"); //Get the address of buffer
asm("mov howmany,%ebx"); //Set the loop number
asm("buf_loop:"); //Lable for beginning of loop
asm("movb (%esi),%al"); //Copy buffer[x] to al
asm("inc %esi"); //Increment buffer address
asm("dec %ebx"); //Decrement loop count
asm("jnz buf_loop"); //jump to buf_loop if(ebx>0)
My Problem
I am using the gcc compiler. For some reason my buffer/howmany variables are undefined in the eyes of my asm. I'm not sure why. I just want to move the beginning address of my buffer array into the esi register, loop it 'howmany' times while copying each element to the al register.
Are you using the inline assembler in gcc? (If not, in what other C++ compiler, exactly?)
If gcc, see the details here, and in particular this example:
asm ("leal (%1,%1,4), %0"
: "=r" (five_times_x)
: "r" (x)
);
%0 and %1 are referring to the C-level variables, and they're listed specifically as the second (for outputs) and third (for inputs) parameters to asm. In your example you have only "inputs" so you'd have an empty second operand (traditionally one uses a comment after that colon, such as /* no output registers */, to indicate that more explicitly).
The part that declares an array like that
int howmany = 5046;
char buffer[howmany];
is not valid C++. In C++ it is impossible to declare an array that has "variable" or run-time size. In C++ array declarations the size is always a compile-time constant.
If your compiler allows this array declaration, it means that it implements it as an extension. In that case you have to do your own research to figure out how it implements such a run-time sized array internally. I would guess that internally buffer will be implemented as a pointer, not as a true array. If my guess is correct and it is really a pointer, then the proper way to load the address of the array into esi might be
mov buffer,%esi
and not a lea, as in your code. lea will only work with "normal" compile-time sized arrays, but not with run-time sized arrays.
Another question is whether you really need a run-time sized array in your code. Could it be that you just made it so by mistake? If you simply change the howmany declaration to
const int howmany = 5046;
the array will turn into an "normal" C++ array and your code might start working as is (i.e. with lea).
All of those asm instructions need to be in the same asm statement if you want to be sure they're contiguous (without compiler-generated code between them), and you need to declare input / output / clobber operands or you will step on the compiler's registers.
You can't use lea or mov to/from a C variable name (except for global / static symbols which are actually defined in the compiler's asm output, but even then you usually shouldn't).
Instead of using mov instructions to set up inputs, ask the compiler to do it for you using input operand constraints. If the first or last instruction of a GNU C inline asm statement, usually that means you're doing it wrong and writing inefficient code.
And BTW, GNU C++ allows C99-style variable-length arrays, so howmany is allowed to be non-const and even set in a way that doesn't optimize away to a constant. Any compiler that can compile GNU-style inline asm will also support variable-length arrays.
How to write your loop properly
If this looks over-complicated, then https://gcc.gnu.org/wiki/DontUseInlineAsm. Write a stand-alone function in asm so you can just learn asm instead of also having to learn about gcc and its complex but powerful inline-asm interface. You basically have to know asm and understand compilers to use it correctly (with the right constraints to prevent breakage when optimization is enabled).
Note the use of named operands like %[ptr] instead of %2 or %%ebx. Letting the compiler choose which registers to use is normally a good thing, but for x86 there are letters other than "r" you can use, like "=a" for rax/eax/ax/al specifically. See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html, and also other links in the inline-assembly tag wiki.
I also used buf_loop%=: to append a unique number to the label, so if the optimizer clones the function or inlines it multiple places, the file will still assemble.
Source + compiler asm output on the Godbolt compiler explorer.
void ext(char *);
int foo(void)
{
int howmany = 5046; // could be a function arg
char buffer[howmany];
//ext(buffer);
const char *bufptr = buffer; // copy the pointer to a C var we can use as a read-write operand
unsigned char result;
asm("buf_loop%=: \n\t" // do {
" movb (%[ptr]), %%al \n\t" // Copy buffer[x] to al
" inc %[ptr] \n\t"
" dec %[count] \n\t"
" jnz buf_loop \n\t" // } while(ebx>0)
: [res]"=a"(result) // al = write-only output
, [count] "+r" (howmany) // input/output operand, any register
, [ptr] "+r" (bufptr)
: // no input-only operands
: "memory" // we read memory that isn't an input operand, only pointed to by inputs
);
return result;
}
I used %%al as an example of how to write register names explicitly: Extended Asm (with operands) needs a double % to get a literal % in the asm output. You could also use %[res] or %0 and let the compiler substitute %al in its asm output. (And then you'd have no reason to use a specific-register constraint unless you wanted to take advantage of cbw or lodsb or something like that.) result is unsigned char, so the compiler will pick a byte register for it. If you want the low byte of a wider operand, you could use %b[count] for example.
This uses a "memory" clobber, which is inefficient. You don't need the compiler to spill everything to memory, only to make sure that the contents of buffer[] in memory matches the C abstract machine state. (This is not guaranteed by passing a pointer to it in a register).
gcc7.2 -O3 output:
pushq %rbp
movl $5046, %edx
movq %rsp, %rbp
subq $5056, %rsp
movq %rsp, %rcx # compiler-emitted to satisfy our "+r" constraint for bufptr
# start of the inline-asm block
buf_loop18:
movb (%rcx), %al
inc %rcx
dec %edx
jnz buf_loop
# end of the inline-asm block
movzbl %al, %eax
leave
ret
Without a memory clobber or input constraint, leave appears before the inline asm block, releasing that stack memory before the inline asm uses the now-stale pointer. A signal-handler running at the wrong time would clobber it.
A more efficient way is to use a dummy memory operand which tells the compiler that the entire array is a read-only memory input to the asm statement. See get string length in inline GNU Assembler for more about this flexible-array-member trick for telling the compiler you read all of an array without specifying the length explicitly.
In C you can define a new type inside a cast, but you can't in C++, hence the using instead of a really complicated input operand.
int bar(unsigned howmany)
{
//int howmany = 5046;
char buffer[howmany];
//ext(buffer);
buffer[0] = 1;
buffer[100] = 100; // test whether we got the input constraints right
//using input_t = const struct {char a[howmany];}; // requires a constant size
using flexarray_t = const struct {char a; char x[];};
const char *dummy;
unsigned char result;
asm("buf_loop%=: \n\t" // do {
" movb (%[ptr]), %%al \n\t" // Copy buffer[x] to al
" inc %[ptr] \n\t"
" dec %[count] \n\t"
" jnz buf_loop \n\t" // } while(ebx>0)
: [res]"=a"(result) // al = write-only output
, [count] "+r" (howmany) // input/output operand, any register
, "=r" (dummy) // output operand in the same register as buffer input, so we can modify the register
: [ptr] "2" (buffer) // matching constraint for the dummy output
, "m" (*(flexarray_t *) buffer) // whole buffer as an input operand
//, "m" (*buffer) // just the first element: doesn't stop the buffer[100]=100 store from sinking past the inline asm, even if you used asm volatile
: // no clobbers
);
buffer[100] = 101;
return result;
}
I also used a matching constraint so buffer could be an input directly, and the output operand in the same register means we can modify that register. We got the same effect in foo() by using const char *bufptr = buffer; and then using a read-write constraint to tell the compiler that the new value of that C variable is what we leave in the register. Either way we leave a value in a dead C variable that goes out of scope without being read, but the matching constraint way can be useful for macros where you don't want to modify the value of your input (and don't need the type of your input: int dummy would work fine, too.)
The buffer[100] = 100; and buffer[100] = 101; assignments are there to show that they both appear in the asm, instead of being merged across the inline-asm (which does happen if you leave out the "m" input operand). IDK why the buffer[100] = 101; isn't optimized away; it's dead so it should be. Also note that asm volatile doesn't block this reordering, so it's not an alternative to a "memory" clobber or using the right constraints.