Operands mismatch for mul when inserting asm into c - c

I'm trying to make an assembly insert into C code. However when I try to multiply two registers inside it I get an error calling for operands mismatch. I tried "mul %%bl, %%cl\n" (double %% because it's in C code). From my past experience with asm I think this should work. I also tried "mul %%cl\n" (moving bl to al first), but in this case I get tons of errors from linker
zad3:(.rodata+0x4): multiple definition of `len'
/tmp/ccJxYyIp.o:(.rodata+0x0): first defined here
zad3: In function `_fini':
(.fini+0x0): multiple definition of `_fini'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o:(.fini+0x0): first defined here
zad3: In function `data_start':
(.data+0x0): multiple definition of `__data_start'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o:(.data+0x0): first defined here
zad3: In function `data_start':
(.data+0x8): multiple definition of `__dso_handle'
/usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o:(.data+0x0): first defined here
zad3:(.rodata+0x0): multiple definition of `_IO_stdin_used'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o:(.rodata.cst4+0x0): first defined here
zad3: In function `_start':
(.text+0x0): multiple definition of `_start'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o: (.text+0x0): first defined here
zad3: In function `data_start':
(.data+0x10): multiple definition of `str'
/tmp/ccJxYyIp.o:(.data+0x0): first defined here
/usr/bin/ld: Warning: size of symbol `str' changed from 4 in /tmp/ccJxYyIp.o to 9 in zad3
zad3: In function `main':
(.text+0xf6): multiple definition of `main'
/tmp/ccJxYyIp.o:zad3.c:(.text+0x0): first defined here
zad3: In function `_init':
(.init+0x0): multiple definition of `_init'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o:(.init+0x0): first defined here
/usr/lib/gcc/x86_64-linux-gnu/5/crtend.o:(.tm_clone_table+0x0): multiple definition of `__TMC_END__'
zad3:(.data+0x20): first defined here
/usr/bin/ld: error in zad3(.eh_frame); no .eh_frame_hdr table will be created.
collect2: error: ld returned 1 exit status
From what I understand, it tells me I defined len and a few other variables a few times, but I cannot see this multiple definition.
The goal of my program is to take a string of numbers and count sum of them but using 2 as a base. So let's say string is 293, then I want to count 2*2^2+9*2^1+3*2^0
Code:
#include <stdio.h>
char str[] = "543";
const int len = 3;
int main(void)
{
asm(
"mov $0, %%rbx \n"
"mov $1, %%rcx \n"
"potega: \n"
"shl $1, %%cl \n"
"inc %%rbx \n"
"cmp len, %%ebx \n"
"jl potega \n"
"mov $0, %%rbx \n"
"petla: \n"
"mov (%0, %%rbx, 1), %%al \n"
"sub $48, %%al \n"
"mul %%al, %%cl \n"
"shr $1, %%cl \n"
"add $48, %%al \n"
"mov %%al, (%0, %%rbx, 1) \n"
"inc %%rbx \n"
"cmp len, %%ebx \n"
"jl petla \n"
:"r"(&str)
:"%rax", "%rbx", "%rcx"
);
printf("Wynik: %s\n", str);
return 0;
}

While I try to avoid "doing people's homework" for them, you have already solved this and given that it has been over a week, have probably already turned it in.
So, looking at your final solution, there are a few things you might want to consider doing differently. In no particular order:
Comments. While all code needs comments, asm REALLY needs comments. As you'll see from my solution (below), having comments alongside the code really helps clarify what the code does. It might seem like a homework project hardly needs them. But since you posted this here, 89 people have tried to read this code. Comments would have made this easier for all of us. Not to mention that it will make life easier for your 'future self,' when you come back months from now to try to maintain it. Comments. Nuff said.
Zeroing registers. While mov $0, %%rbx will indeed put zero in rbx, this is not the most efficient way to zero a register. Using xor %%rbx, %%rbx is both (microscopically) faster and produces (slightly) smaller executable code.
potega. Without comments, it took me a bit to sort out what you were doing in your first loop. You are using rbx to keep track of how many characters you have processed, and cl gets shifted one to the left for each character. A few thoughts here:
3a. First thing I'd do is look at moving the shl $1, %%cl out of the loop. Instead of doing both increment and shift, just count the characters, then do a single shift of the appropriate size. This is (slightly) complicated by the fact that if you want to shift by a variable amount, the amount must be specified in cl (ie shl %%cl, %%rbx). Why cl? Who knows? That's just how shl works. So you'd want to do the counting in cl instead of rbx.
3b. Second thing about this loop has to do with len1. Since you already know the size (it's in len1), why would you even need a loop? Perhaps a more sensible approach would be:
3c. Strings in C are terminated with a null character (aka 0). If you want to find the length of a string, normally you'd walk the string until you find it. This removes the requirement to even have len1.
3d. Your code assumes that the input string is valid. What would happen if you got passed "abc"? Or ""? Validating parameters is boring, time consuming, and makes the program bigger and run slower. On the other hand, it pays HUGE dividends when something unexpected goes wrong. At the very least you should specify your assumptions about your input.
3e. Using global variables is usually a bad idea. You run into naming collisions (2 files both using the name len1), code in several different files all changing the value (making bugs difficult to track down) and it can make your program bigger than it needs to be. There are times when globals are useful, but this does not appear to be one of them. The only purpose here seems to be to allow access to these variables from within the asm, and there are other ways to do that.
3f. You use %0 to refer to str. That works (and is better than accessing the global symbol directly), but it is harder to read than it needs to be. You can associate a name with the parameter and use that instead.
Let's take a break for a moment to see what we've got so far:
"xor %%rcx, %%rcx\n" // Zero the strlen count
// Count how many characters in string
"potega%=: \n\t"
"mov (%[pstr], %%rcx), %%bl\n\t" // Read the next char
"test %%bl, %%bl \n\t" // Check for 0 at end of string
"jz kont%= \n\t"
"cmp $'0', %%bl\n\t" // Ensure digit is 0-9
"jl gotowe%=\n\t"
"cmp $'9', %%bl\n\t"
"jg gotowe%=\n\t"
"inc %%rcx \n\t" // Increment index/len
"jmp potega%= \n"
"kont%=:\n\t"
// rcx = the number of character in the string excluding null
You'll notice that I'm using %= at the end of all the labels. You can read about what this does in the gcc docs, but mostly it just appends a number to the labels. Why do that? Well, if you wanted to try computing multiple strings in a single run (like I do below), you might call this code several times. But compilers (being the tricky devils that they are) might choose to "inline" your assembler. That would mean you'd have several chunks of code that all had the same label names in the same routine. Which would cause your compile to fail.
Note that I don't check to see if the string is "too long" or NULL. Left as an exercise for the student...
Ok, what else?
petla. Mostly my code matches yours.
4a. I did change to sub $'0', %%al instead of just using $48. It does the same thing, but subtracting '0' seems to me to be more "self-documenting."
4b. I also slightly reordered things to put the shr at the end. Why do that? You use cmp along with jz to see when it's time to exit the loop. The way cmp works is that it sets some flags in the flags register, then jz looks at those flags to figure out whether to jump or not. However shr sets those flags too. Each time you shift, you are moving that '1' further and further to the right. What happens when it's at the rightmost position and you shift it 1 more? You get zero. At which point the "jump if not zero" (aka jnz) works as expected. Since you have to do the shr anyway, why not use it to tell you when to exit the loop too?
That gives me:
"petla%=:\n\t"
"mov (%[pstr], %%rcx, 1), %%al\n\t" // read the next char
"sub $'0', %%al\n\t" // convert char to value
"mul %%bl\n\t" // mul bl * al -> ax
"add %%ax, %[res]\n\t" // Accumulate result
"inc %%rcx\n\t" // move to next char
"shr $1, %%rbx\n\t" // decrease our exponent
"jnz petla%=\n" // Has our exponent gone to 0?
"gotowe%=:"
Lastly, the parameters:
:[res] "=r"(result)
:[pstr] "r"(str), "0"(0)
:"%rax", "%rbx", "%rcx", "cc"
I'm going to store the result in the C variable named result. Since I specify =r with this constraint, I know that it is stored in a register, although I don't know which register the compiler will pick. But I don't need to. I can just refer to it using %[res] and let the compiler sort it out. Likewise I refer to the string using %[pstr]. I could use %0 like you did, except that since I've added result, pstr isn't %0 anymore, it's %1 (result is now %0). This is another reason to use names instead of numbers.
That last bit ("0"(0)) might take a bit of explaining. Using "0" for the constraint (instead of say "r") tells the compiler to put this value into the same place as parameter #0. The (0) says store a zero there before starting the asm. In other words, initialize the register that is going to hold result to 0. Yes, I could do this in the asm. But I prefer to let the compiler do this for me. While it may not matter in a tiny program like this, letting the C compiler do as much work as possible tends to produce the most efficient code.
So, when we wrap this all together, I get:
/*
my_file.c - The goal of this program is to take a string of numbers and
count sum of them but using 2 as a base.
example: "543" -> 5*(2^2)+4*(2^1)+3*(2^0)=31
*/
#include <stdio.h>
void TestOne(const char *str)
{
short result;
// Code assumes str is not NULL. Strings with non-digits and zero
// length strings return 0.
asm(
"xor %%rcx, %%rcx\n" // Zero the strlen count
// Count how many characters in string
"potega%=: \n\t"
"mov (%[pstr], %%rcx), %%bl\n\t" // Read the next char
"test %%bl, %%bl \n\t" // Check for 0 at end of string
"jz kont%= \n\t"
"cmp $'0', %%bl\n\t" // Ensure digit is 0-9
"jl gotowe%=\n\t"
"cmp $'9', %%bl\n\t"
"jg gotowe%=\n\t"
"inc %%rcx \n\t" // Increment index/len
"jmp potega%= \n"
"kont%=:\n\t"
// rcx = the number of character in the string excluding null
"dec %%rcx \n\t" // We want to shift rbx 1 less than pstr length
"jl gotowe%=\n\t" // Check for zero length string
"mov $1, %%rbx\n\t" // Set exponent for first digit
"shl %%cl, %%rbx\n\t"
"xor %%rcx, %%rcx\n" // Reset string index
"petla%=:\n\t"
"mov (%[pstr], %%rcx, 1), %%al\n\t" // read the next char
"sub $'0', %%al\n\t" // convert char to value
"mul %%bl\n\t" // mul bl * al -> ax
"add %%ax, %[res]\n\t" // Accumulate result
"inc %%rcx\n\t" // move to next char
"shr $1, %%rbx\n\t" // decrease our exponent
"jnz petla%=\n" // Has our exponent gone to 0?
"gotowe%=:"
:[res] "=r"(result)
:[pstr] "r"(str), "0"(0)
:"%rax", "%rbx", "%rcx", "cc"
);
printf("Wynik: \"%s\" = %d\n", str, result);
}
int main(){
TestOne("x");
TestOne("");
TestOne("5");
TestOne("54");
TestOne("543");
TestOne("5432");
return 0;
}
Notice: No global variables. And no len1. Just a pointer to the string.
It might be interesting to experiment and see how long a string you can support. Using mul %%bl, add %%ax and short result works for tiny strings like these, but will eventually be insufficient as the strings get longer (requiring eax or rax etc). I'll leave that for you too. Warning: There's a trick when moving 'up' from mul %%bl to mul %%bx.
One last point about letting the compiler do as much work as possible tends to produce the most efficient code: Sometimes people assume that since they are writing assembler, this will result in faster code than if they write it in C. However, these people fail to take into account the fact that the entire purpose of a C compiler is to turn your C code into assembler. When you turn on optimization (-O2), the compiler is almost certainly going to turn your (well-written) C code into better assembler code than anything you can write by hand.
There are thousands of tweaks and tricks like the ones I've mentioned here. And the people who write compilers know them all. While there are a few places where inline asm can make sense, smart programmers leave this work to the lunatics who write compilers whenever possible. See also this.
I realize this is just a school project and you are only doing what your teacher requires, but since she has elected to use the most difficult way possible to teach you asm, perhaps she failed to mention that the thing you are doing is something you should (almost) never do in real life.
This post turned out longer than I expected. Hopefully there is information here that you can use. And forgive my attempts at Polish labels. Hopefully I haven't said anything obscene...

As somebody pointed out - yes it's a student exercise.
When it comes to my original problem, when I removed line add $48,%%al \n" it worked. I also switched to mul %%cl.
When it comes to rest of problems, you pointed out, I talked with my professor and she slightly changed her mind (or I got the assgment wrong the first time - whatever you find more possible) and now she wanted me to return an argument from the inline function and said the intiger type was good. It resulted in me writing such piece of code (which actually does what I wanted)
example: "543" -> 5*(2^2)+4*(2^1)+3*(2^0)=31
#include <stdio.h>
char str[] = "543";
const int len = 3;
int len1 = 2;
int result;
int main(){
asm(
"mov $0, %%rbx\n"
"mov $1, %%rcx\n"
"mov $0, %%rdx\n"
"potega: \n"
"inc %%rbx\n"
"shl $1, %%cl\n"
"cmp len1, %%ebx \n"
"jl potega\n"
"mov $0, %%rbx\n"
"petla:\n"
"mov (%0, %%rbx, 1), %%al\n"
"sub $48, %%al\n"
"mul %%cl\n"
"shr $1, %%cl\n"
"add %%al, %%dl\n"
"inc %%rbx\n"
"cmp len, %%ebx\n"
"jl petla\n"
"movl %%edx, result\n"
://"=r"(result)
:"r"(&str), "r"(&result)
:"%rax", "%rbx", "%rcx", "%rdx"
);
printf("Wynik: %d\n", result);
return 0;
}
Also - I do realise, that normally you return variables the way it's showed in comment, but it didn't work, so by my professor's suggestion I wrote the program this way.
Thanks everybody for help!

Related

Calculating the Fibonacci Sequence using inline assembly in C

I've tried to make a simple console program in C (using clang as the compiler) that would use inline assembly to calculate the Fibonacci's number with the index that's entered in the standard input.
#include <stdio.h>
int main()
{
int ulaz;
scanf("%d",&ulaz);
int rezultat;
asm(
"mov %1,%%ecx\n"
".intel_syntax\n"
"mov eax,0\n"
"mov ebx,1\n"
"petlja:\n"
"add eax,ebx\n"
"xchg eax,ebx\n"
"loop petlja\n"
".att_syntax\n"
"mov %%ebx,%0\n"
: "=m" (rezultat)
: "m" (ulaz)
);
printf("%d\n",rezultat);
return 0;
}
It appears to calculate the Fibonacci's numbers, but not with the index the user has entered. For instance, for the input "10", it should output "55" (the 10th Fibonacci's number), but it outputs "89" (which is a Fibonacci's number, but not the 10th Fibonacci's number). Any idea where the error is?
It looks like you need to move the count check to the beginning of the loop block rather than the end, that because you do the check at the end you go through the loop one more time than you want. Either that or dec ecx before entering the loop.
Moving the check to the begging would be something like (not checked, just illustrative):
"mov %1,%%ecx\n"
".intel_syntax\n"
"mov eax,0\n"
"mov ebx,1\n"
loop_start:\n"
"test ecx, ecx\n"
"jz loop_done"
"add eax,ebx\n"
"xchg eax,ebx\n"
"dec ecx\n"
"jmp loop_start\n"
"loop_done:\n"
".att_syntax\n"
"mov %%ebx,%0\n"

multiplication instruction error in inline assembly

Consider following program:
#include <stdio.h>
int main(void) {
int foo = 10, bar = 15;
__asm__ __volatile__("add %%ebx,%%eax"
:"=a"(foo)
:"a"(foo), "b"(bar)
);
printf("foo+bar=%d\n", foo);
}
I know that add instruction is used for addition, sub instruction is used for subtraction & so on. But I didn't understand these lines:
__asm__ __volatile__("add %%ebx,%%eax"
:"=a"(foo)
:"a"(foo), "b"(bar)
);
What is the exact meaning of :"=a"(foo) :"a"(foo), "b"(bar) ); ? What it does ? And when I try to use mul instruction here I get following error for the following program:
#include <stdio.h>
int main(void) {
int foo = 10, bar = 15;
__asm__ __volatile__("mul %%ebx,%%eax"
:"=a"(foo)
:"a"(foo), "b"(bar)
);
printf("foo*bar=%d\n", foo);
}
Error: number of operands mismatch for `mul'
So, why I am getting this error ? How do I solve this error ? I've searched on google about these, but I couldn't find solution of my problem. I am using windows 10 os & processor is intel core i3.
What is the exact meaning of :"=a"(foo) :"a"(foo), "b"(bar) );
There is a detailed description of how parameters are passed to the asm instruction here. In short, this is saying that bar goes into the ebx register, foo goes into eax, and after the asm is executed, eax will contain an updated value for foo.
Error: number of operands mismatch for `mul'
Yeah, that's not the right syntax for mul. Perhaps you should spend some time with an x86 assembler reference manual (for example, this).
I'll also add that using inline asm is usually a bad idea.
Edit: I can't fit a response to your question into a comment.
I'm not quite sure where to start. These questions seem to indicate that you don't have a very good grasp of how assembler works at all. Trying to teach you asm programming in a SO answer is not really practical.
But I can point you in the right direction.
First of all, consider this bit of asm code:
movl $10, %eax
movl $15, %ebx
addl %ebx, %eax
Do you understand what that does? What will be in eax when this completes? What will be in ebx? Now, compare that with this:
int foo = 10, bar = 15;
__asm__ __volatile__("add %%ebx,%%eax"
:"=a"(foo)
:"a"(foo), "b"(bar)
);
By using the "a" constraint, you are asking gcc to move the value of foo into eax. By using the "b" constraint you are asking it to move bar into ebx. It does this, then executes the instructions for the asm (ie add). On exit from the asm, the new value for foo will be in eax. Get it?
Now, let's look at mul. According to the docs I linked you to, we know that the syntax is mul value. That seems weird, doesn't it? How can there only be one parameter to mul? What does it multiple the value with?
But if you keep reading, you see "Always multiplies EAX by a value." Ahh. So the "eax" register is always implied here. So if you were to write mul %ebx, that would really be mean mul ebx, eax, but since it always has to be eax, there's no real point it writing it out.
However, it's a little more complicated than that. ebx can hold a 32bit value number. Since we are using ints (instead of unsigned ints), that means that ebx could have a number as big as 2,147,483,647. But wait, what happens if you multiply 2,147,483,647 * 10? Well, since 2,147,483,647 is already as big a number as you can store in a register, the result is much too big to fit into eax. So the multiplication (always) uses 2 registers to output the result from mul. This is what that link meant when it referred "stores the result in EDX:EAX."
So, you could write your multiplication like this:
int foo = 10, bar = 15;
int upper;
__asm__ ("mul %%ebx"
:"=a"(foo), "=d"(upper)
:"a"(foo), "b"(bar)
:"cc"
);
As before, this puts bar in ebx and foo in eax, then executes the multiplication instruction.
And after the asm is done, eax will contain the lower part of the result and edx will contain the upper. If foo * bar < 2,147,483,647, then foo will contain the result you need and upper will be zero. Otherwise, things get more complicated.
But that's as far as I'm willing to go. Other than that, take an asm class. Read a book.
PS You might also look at this answer and the 3 comments that follow that show why even your "add" example is "wrong."
PPS If this answer has resolved your question, don't forget to click the check mark next to it so I get my karma points.

Assembly loop through a string to count characters

i try to make an assembly code that count how many characters is in the string, but i get an error.
Code, I use gcc and intel_syntax
#include <stdio.h>
int main(){
char *s = "aqr b qabxx xryc pqr";
int x;
asm volatile (
".intel_syntax noprefix;"
"mov eax, %1;"
"xor ebx,ebx;"
"loop:"
"mov al,[eax];"
"or al, al;"
"jz print;"
"inc ebx;"
"jmp loop"
"print:"
"mov %0, ebx;"
".att_syntax prefix;"
: "=r" (x)
: "r" (s)
: "eax", "ebx"
);
printf("Length of string: %d\n", x);
return 0;
}
And i got error:
Error: invalid use of register
Finally I want to make program, which search for regex pattern([pq][^a]+a) and prints it's start position and length. I wrote it in C, but I have to make it work in assembly:
My C code:
#include <stdio.h>
#include <string.h>
int main(){
char *s = "aqr b qabxx xryc pqr";
int y,i;
int x=-1,length=0, pos = 0;
int len = strlen(s);
for(i=0; i<len;i++){
if((s[i] == 'p' || s[i] == 'q') && length<=0){
pos = i;
length++;
continue;
} else if((s[i] != 'a')) && pos>0){
length++;
} else if((s[i] == 'a') && pos>0){
length++;
if(y < length) {
y=length;
length = 0;
x = pos;
pos = 0;
}
else
length = 0;
pos = 0;
}
}
printf("position: %d, length: %d", x, y);
return 0;
}
You omitted the semicolon after jmp loop and print:.
Also your asm isn't going to work correctly. You move the pointer to s into eax, but then you overwrite it with mov al,[eax]. So the next pass thru the loop, eax doesn't point to the string anymore.
And when you fix that, you need to think about the fact that each pass thru the loop needs to change eax to point to the next character, otherwise mov al,[eax] keeps reading the same character.
Since you haven't accepted an answer yet (by clicking the checkmark to the left), there's still time for one more edit.
Normally I don't "do people's homework", but it's been a few days. Presumably the due date for the assignment has passed. Such being the case, here are a few solutions, both for the education of the OP and for future SO users:
1) Following the (somewhat odd) limitations of the assignment:
asm volatile (
".intel_syntax noprefix;"
"mov eax, %1;"
"xor ebx,ebx;"
"cmp byte ptr[eax], 0;"
"jz print;"
"loop:"
"inc ebx;"
"inc eax;"
"cmp byte ptr[eax], 0;"
"jnz loop;"
"print:"
"mov %0, ebx;"
".att_syntax prefix;"
: "=r" (x)
: "r" (s)
: "eax", "ebx"
);
2) Violating some of the assignment rules to make slightly better code:
asm (
"\n.intel_syntax noprefix\n\t"
"mov eax, %1\n\t"
"xor %0,%0\n\t"
"cmp byte ptr[eax], 0\n\t"
"jz print\n"
"loop:\n\t"
"inc %0\n\t"
"inc eax\n\t"
"cmp byte ptr[eax], 0\n\t"
"jnz loop\n"
"print:\n"
".att_syntax prefix"
: "=r" (x)
: "r" (s)
: "eax", "cc", "memory"
);
This uses 1 fewer register (no ebx) and omits the (unnecessary) volatile qualifier. It also adds the "cc" clobber to indicate that the code modifies the flags, and uses the "memory" clobber to ensure that any 'pending' writes to s get flushed to memory before executing the asm. It also uses formatting (\n\t) so the output from building with -S is readable.
3) Advanced version which uses even fewer registers (no eax), checks to ensure that s is not NULL (returns -1), uses symbolic names and assumes -masm=intel which results in more readable code:
__asm__ (
"test %[string], %[string]\n\t"
"jz print\n"
"loop:\n\t"
"inc %[length]\n\t"
"cmp byte ptr[%[string] + %[length]], 0\n\t"
"jnz loop\n"
"print:"
: [length] "=r" (x)
: [string] "r" (s), "[length]" (-1)
: "cc", "memory"
);
Getting rid of the (arbitrary and not well thought out) assignment constraints allows us to reduce this to 7 lines (5 if we don't check for NULL, 3 if we don't count labels [which aren't actually instructions]).
There are ways to improve this even further (using %= on the labels to avoid possible duplicate symbol issues, using local labels (.L), even writing it so it works for both -masm=intel and -masm=att, etc.), but I daresay that any of these 3 are better than the code in the original question.
Well Kuba, I'm not sure what more you are after here before you'll accept an answer. Still, it does give me the chance to include Peter's version.
4) Pointer increment:
__asm__ (
"cmp byte ptr[%[string]], 0\n\t"
"jz .Lprint%=\n"
".Loop%=:\n\t"
"inc %[length]\n\t"
"cmp byte ptr[%[length]], 0\n\t"
"jnz .Loop%=\n"
".Lprint%=:\n\t"
"sub %[length], %[string]"
: [length] "=&r" (x)
: [string] "r" (s), "[length]" (s)
: "cc", "memory"
);
This does not do the 'NULL pointer' check from #3, but it does do the 'pointer increment' that Peter was recommending. It also avoids potential duplicate symbols (using %=), and uses 'local' labels (ones that start with .L) to avoid extra symbols getting written to the object file.
From a "performance" point of view, this might be slightly better (I haven't timed it). However from a "school project" point of view, the clarity of #3 seems like it would be a better choice. From a "what would I write in the real world if for some bizarre reason I HAD to write this in asm instead of just using a standard c function" point of view, I'd probably look at usage, and unless this was performance critical, I'd be tempted to go with #3 in order to ease future maintenance.

Inline assembly: clarification of constraint modifiers

Two questions:
(1) If I understand ARM inline assembly correctly, a constraint of "r" says that the instruction operand can only be a core register and that by default is a read-only operand. However, I've noticed that if the same instruction has an output operand with the constraint "=r", the compiler may re-use the same register. This seems to violate the "read-only" attribute. So my question is: Does "read-only" refer to the register, or to the C variable that it is connected to?
(2) Is it correct to say that presence of "&" in the constraint of "=&r" simply requires that the register chosen for the output operand must not be the same as one of the input operand registers? My question relates to the code below used to compute the integer power function: i.e., are the "&" constraint modifiers necessary/appropriate?
asm (
" MOV %[power],1 \n\t"
"loop%=: \n\t"
" CBZ %[exp],done%= \n\t"
" LSRS %[exp],%[exp],1 \n\t"
" IT CS \n\t"
" MULCS %[power],%[power],%[base] \n\t"
" MUL %[base],%[base],%[base] \n\t"
" B loop%= \n\t"
"done%=: "
: [power] "+&r" (power)
[base] "+&r" (base)
[exp] "+&r" (exp)
:
: "cc"
) ;
Thanks!
Dan
Read-only refers to the use of the operand in assembly code. The assembly code can only read from the operand, and it must do so before any normal output operand (not an early clobber or a read/write operand) is written. This is because, as you've seen, the same register can be allocated to both an input and output operand. The assumption is that inputs are fully consumed before any output is written, which is normally the case for an assembly instruction.
I don't think using an early-clobber modifier & with an read/write modifier + has any effect since a register allocated to a read/write operand can't be used for anything else.
Here's how I'd write your code:
unsigned power = 1;
asm (
" CBZ %[exp],done%= \n\t"
"loop%=: \n\t"
" LSRS %[exp],%[exp],1 \n\t"
" IT CS \n\t"
" MULCS %[power],%[power],%[base] \n\t"
" MUL %[base],%[base],%[base] \n\t"
" BNE loop%= \n\t"
"done%=: "
: [power] "+r" (power),
[base] "+r" (base),
[exp] "+r" (exp)
:
: "cc"
) ;
Note the transformation of putting the loop test at the end of the loop, saving one instruction. Without it the code doesn't have any obvious improvement over what the compiler can generate. I also let the compiler do the initialization of the register used for the power operand. There's a small chance it will be able to allocate a register that already has the value 1 in it.
Thanks to all of you for the clarification. Just to be sure that I have it right, would it be correct to say that the choice between "=r" and "+r" for an output operand comes down to how the corresponding register is first used in the assembly template? I.e.,
"=r": The first use of the register is as a write-only output of an instruction.
The register may be re-used later by another instruction as an input or output. Adding an early clobber constraint (e.g., "=&r") prevents the compiler from assigning a register that was previously used as an input operand.
"+r": The first use of the register is as an input to an instruction, but the register is used again later as an output.
Best,
Dan

Multithreading with inline assembly and access to a c variable

I'm using inline assembly to construct a set of passwords, which I will use to brute force against a given hash. I used this website as a reference for the construction of the passwords.
This is working flawlessly in a singlethreaded environment. It produces an infinite amount of incrementing passwords.
As I have only basic knowledge of asm, I understand the idea. The gcc uses ATT, so I compile with -masm=intel
During the attempt to multithread the program, I realize that this approach might not work.
The following code uses 2 global C variables, and I assume that this might be the problem.
__asm__("pushad\n\t"
"mov edi, offset plaintext\n\t" <---- global variable
"mov ebx, offset charsetTable\n\t" <---- again
"L1: movzx eax, byte ptr [edi]\n\t"
" movzx eax, byte ptr [charsetTable+eax]\n\t"
" cmp al, 0\n\t"
" je L2\n\t"
" mov [edi],al\n\t"
" jmp L3\n\t"
"L2: xlat\n\t"
" mov [edi],al\n\t"
" inc edi\n\t"
" jmp L1\n\t"
"L3: popad\n\t");
It produces a non deterministic result in the plaintext variable.
How can i create a workaround, that every thread accesses his own plaintext variable? (If this is the problem...).
I tried modifying this code, to use extended assembly, but I failed every time. Probably due to the fact that all tutorials use ATT syntax.
I would really appreciate any help, as I'm stuck for several hours now :(
Edit: Running the program with 2 threads, and printing the content of plaintext right after the asm instruction, produces:
b
b
d
d
f
f
...
Edit2:
pthread_create(&thread[i], NULL, crack, (void *) &args[i]))
[...]
void *crack(void *arg) {
struct threadArgs *param = arg;
struct crypt_data crypt; // storage for reentrant version of crypt(3)
char *tmpHash = NULL;
size_t len = strlen(param->methodAndSalt);
size_t cipherlen = strlen(param->cipher);
crypt.initialized = 0;
for(int i = 0; i <= LIMIT; i++) {
// intel syntax
__asm__ ("pushad\n\t"
//mov edi, offset %0\n\t"
"mov edi, offset plaintext\n\t"
"mov ebx, offset charsetTable\n\t"
"L1: movzx eax, byte ptr [edi]\n\t"
" movzx eax, byte ptr [charsetTable+eax]\n\t"
" cmp al, 0\n\t"
" je L2\n\t"
" mov [edi],al\n\t"
" jmp L3\n\t"
"L2: xlat\n\t"
" mov [edi],al\n\t"
" inc edi\n\t"
" jmp L1\n\t"
"L3: popad\n\t");
tmpHash = crypt_r(plaintext, param->methodAndSalt, &crypt);
if(0 == memcmp(tmpHash+len, param->cipher, cipherlen)) {
printf("success: %s\n", plaintext);
break;
}
}
return 0;
}
Since you're already using pthreads, another option is making the variables that are modified by several threads into per-thread variables (threadspecific data). See pthread_getspecific OpenGroup manpage. The way this works is like:
In the main thread (before you create other threads), do:
static pthread_key_y tsd_key;
(void)pthread_key_create(&tsd_key); /* unlikely to fail; handle if you want */
and then within each thread, where you use the plaintext / charsetTable variables (or more such), do:
struct { char *plainText, char *charsetTable } *str =
pthread_getspecific(tsd_key);
if (str == NULL) {
str = malloc(2 * sizeof(char *));
str.plainText = malloc(size_of_plaintext);
str.charsetTable = malloc(size_of_charsetTable);
initialize(str.plainText); /* put the data for this thread in */
initialize(str.charsetTable); /* ditto */
pthread_setspecific(tsd_key, str);
}
char *plaintext = str.plainText;
char *charsetTable = str.charsetTable;
Or create / use several keys, one per such variable; in that case, you don't get the str container / double indirection / additional malloc.
Intel assembly syntax with gcc inline asm is, hm, not great; in particular, specifying input/output operands is not easy. I think to get that to use the pthread_getspecific mechanism, you'd change your code to do:
__asm__("pushad\n\t"
"push tsd_key\n\t" <---- threadspecific data key (arg to call)
"call pthread_getspecific\n\t" <---- gets "str" as per above
"add esp, 4\n\t" <---- get rid of the func argument
"mov edi, [eax]\n\t" <---- first ptr == "plainText"
"mov ebx, [eax + 4]\n\t" <---- 2nd ptr == "charsetTable"
...
That way, it becomes lock-free, at the expense of using more memory (one plaintext / charsetTable per thread), and the expense of an additional function call (to pthread_getspecific()). Also, if you do the above, make sure you free() each thread's specific data via pthread_atexit(), or else you'll leak.
If your function is fast to execute, then a lock is a much simpler solution because you don't need all the setup / cleanup overhead of threadspecific data; if the function is either slow or very frequently called, the lock would become a bottleneck though - in that case the memory / access overhead for TSD is justified. Your mileage may vary.
Protect this function with mutex outside of inline Assembly block.

Resources