I want to verify the role of volatile by this method. But my inline assembly code doesn't seem to be able to modify the value of i without the compiler knowing. According to the articles I read, I only need to write assembly code like __asm { mov dword ptr [ebp-4], 20h }, I think I write the same as what he did.
actual output:
before = 10
after = 123
expected output:
before = 10
after = 10
Article link: https://www.runoob.com/w3cnote/c-volatile-keyword.html
#include <stdio.h>
int main() {
int a, b;
// volatile int i = 10;
int i = 10;
a = i;
printf("before = %d\n", a);
// Change the value of i in memory without letting the compiler know.
// I can't run the following statement here, so I wrote one myself
// mov dword ptr [ebp-4], 20h
asm("movl $123, -12(%rbp)");
b = i;
printf("after = %d\n", b);
}
I want to verify the role of volatile ...
You can't.
If a variable is not volatile, the compiler may optimize; it does not need to do this.
A compiler may always treat any variable as volatile.
How to change the value of a variable without the compiler knowing?
Create a second thread writing to the variable.
Example
The following example is for Linux (under Windows, you need a different function than pthread_create()):
#include <stdio.h>
#include <pthread.h>
int testVar;
volatile int waitVar;
void * otherThread(void * dummy)
{
while(waitVar != 2) { /* Wait */ }
testVar = 123;
waitVar = 3;
return NULL;
}
int main()
{
pthread_t pt;
waitVar = 1;
pthread_create(&pt, 0, otherThread, NULL);
testVar = 10;
waitVar = 2;
while(waitVar != 3) { /* Wait */ }
printf("%d\n", testVar - 10);
return 0;
}
If you compile with gcc -O0 -o x x.c -lpthread, the compiler does not optimize and works like all variables are volatile. printf() prints 113.
If you compile with -O3 instead of -O0, printf() prints 0.
If you replace int testVar by volatile int testVar, printf() always prints 113 (independent of -O0/-O3).
(Tested with the GCC 9.4.0 compiler.)
Related
I am wondering if the following example is a Clang SA false positive, and if so, is there a way to suppress it?
The key here is that I am copying a structure containing bit-fields by casting it as a word instead of a field-by-field copy (or memcpy). Both field-by-field copy and memcpy doesn't trigger warnings, but copying as a word (after casting) raises an "uninitialized access" warning. This is on a embedded system where only word-access is possible and these types of word copies are common place.
Below is the example code:
#include <stdio.h>
#include <string.h>
struct my_fields_t {
unsigned int f0: 16;
unsigned int f1: 8;
unsigned int f2: 8;
};
int main(void) {
struct my_fields_t var1, var2;
// initialize all the fields in var1.
var1.f0 = 1;
var1.f1 = 2;
var1.f2 = 3;
// Method #1: copy var1 -> var2 as a word (sizeof(unsigned int) = 4).
unsigned int *src = (unsigned int *) &var1;
unsigned int *dest = (unsigned int *) &var2;
*dest = *src;
// Method #2: copy var1->var2 field-by-field [NO SA WARNINGS]
// var2.f0 = var1.f0;
// var2.f1 = var1.f1;
// var2.f2 = var1.f2;
// Method #3: use memcpy to copy var1 to var2 [NO SA WARNINGS]
// memcpy(&var2, &var1, sizeof(struct my_fields_t));
printf("%d, %d, %d\n", var1.f0, var1.f1, var1.f2);
printf("%d, %d, %d\n", var2.f0, var2.f1, var2.f2); // <--- Function call argument is an uninitialized value
printf("sizeof(unsigned int) = %ld\n", sizeof(unsigned int));
}
Here's the output:
$ clang --version
clang version 4.0.0 (tags/RELEASE_401/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
$ clang -Wall clang_sa.c
$ ./a.out
1, 2, 3
1, 2, 3
sizeof(unsigned int) = 4
$ scan-build clang clang_sa.c
scan-build: Using '<snipped>/clang-4.0' for static analysis
clang_sa.c:33:3: warning: Function call argument is an uninitialized value
printf("%d, %d, %d\n", var2.f0, var2.f1, var2.f2); // <--- Function call argument is an uninitialized value
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
scan-build: 1 bug found.
In the above example, it is quite clear that all the fields in var2 will be initialized by the word copy. So, clang SA shouldn't complain about un-intialized access.
I appreciate any help/insight.
In terms of suppressing a specific warning, from the documentation:
Q: How can I suppress a specific analyzer warning?
There is currently no solid mechanism for suppressing an analyzer warning, although this is currently being investigated. ...
But on the next question, it shows you that you can mark a block of code to be skipped over during static analysis by surrounding the code with an #ifdef block:
Q: How can I selectively exclude code the analyzer examines?
When the static analyzer is using clang to parse source files, it implicitly defines the preprocessor macro __clang_analyzer__. One can use this macro to selectively exclude code the analyzer examines. ...
So, you could do it like this:
#ifdef __clang_analyzer__
#define COPY_STRUCT(DEST, SRC) (DEST) = (SRC)
#else
#define COPY_STRUCT(DEST, SRC) do { \
const unsigned int *src = (const void *)&(SRC); \
unsigned int *dest = (void *)&(DEST); \
*dest = *src; \
} while(0)
#endif
COPY_STRUCT(var2, var1);
Is it possible to store a function pointer contents in C. I know you can store every kind of pointer in a variable. But if I can "unwrap" an integer pointer (to an integer) or string pointer (to an unsigned char), wouldn't I be able to decode a function pointer.
To be more clear, I mean to store the machine code instructions in a variable.
You're missing an important fact: A function isn't a (first-class) object in C.
There are two basic types of pointers in C: Data pointers and function pointers. Both can be dereferenced using *.
The similarities end here. A data object has a stored value, so dereferencing a data pointer accesses this value:
int a = 5;
int *b = &a;
int c = *b; // 5
A function is just this, a function. You can call a function, so you can call the result of dereferencing a function pointer. It doesn't have a stored value:
int x(void) { return 1; }
int (*y)(void) = &x; // valid also without the address-of operator
// ...
int main(void)
{
int a = (*y)(); // valid also without explicit dereference like int a = y();
}
For ease of handling, C allows omitting the & operator when assigning a function to a function pointer and also omitting the explicit dereference when calling a function through a function pointer.
In short: using pointers doesn't change anything about the semantics of data objects vs functions.
Also note in this context that function and data pointers aren't compatible. You can't assign a function pointer to void *. It's even possible to have a platform where a function pointer has a different size from a data pointer.
In practice, on a platform where a function pointer has the same format as a data pointer, you could "convince" your compiler to access the actual binary code located there by casting your pointer to const char *. But be aware this is undefined behavior.
A pointer in C is the address of some object in memory. An int * is the address of an int, a pointer to a function is the address where the code of the function is stored in memory.
While you can read some bytes from the address of a function in memory, they are just bytes and nothing else. You need to know how to interpret these bytes in order to "store the machine code instructions in a variable". And the real problem here is to know where to stop, where the code of one function ends and the code of another function begins.
These things are not defined by the language and they depend on many factors: the processor architecture, the OS, the compiler, the compiler flags used to compile the code (for optimizations f.e.).
The real question here is: assuming you can "store the machine code instructions in a variable" how do you want to use it? It is just a sequence of bytes meaningless for most humans and it cannot be used to execute the function. If you are not writing a compiler, linker, emulator, operating system or something similar, there is nothing useful you can do with the machine code instruction of a function. (And if you are writing one of the above then you know the answer and you do not ask such questions on SO or somewhere else.)
Assume we are talking about von Neumann architecture.
Basically we have a single memory which contains both instructions and data. However modern OSes are able to control memory access permissions (read/write/execute).
Standardwise it is undefined behaviour to cast function pointer to data pointer. Although if we are talking say Linux, gcc and modern x86-64 CPU, you may do such a conversion, what you'll get will be a pointer into readonly executable segment of memory.
For instance take a look at this simple program:
#include <stdio.h>
int func() {
return 1;
}
int main() {
unsigned char * code = (void*)func;
printf("%02x\n%02x%02x%02x\n%02x%02x%02x%02x%02x\n%02x\n%02x\n",
*code,
*(code+1), *(code+2), *(code+3),
*(code+4), *(code+5), *(code+6), *(code+7), *(code+8),
*(code+9),
*(code+10));
}
Compiled with:
gcc -O0 -o tst tst.c
It's output on my machine is:
55 // push rbp
4889e5 // mov rsp, rbp
b801000000 // mov eax, 0x1
5d // pop rbp
c3 // ret
Which as you may see is indeed our function.
Since OS provides you with ability to mark memory executable you may in fact write your functions in runtime all you need is to generate current platform opcodes and mark memory executable. Which is exactly how JIT compilers work. For an excellent example of such a compiler take a look at LuaJIT.
The code here should be a skeleton to inject code into a program. But if you execute it in a SO such as Linux or Windows you will get an exception before the execution of the first instruction the fn_ptr points.
#include <stdio.h>
#include <malloc.h>
typedef int FN(void);
int main(void)
{
FN * fn_ptr;
char * x;
fn_ptr = malloc(10240);
x = (char *)fn_ptr;
// ... Insert code into x that points the same memory of fn_ptr;
x[0]='\xeb'; x[1]='\xfe'; // jmp $ that is like while(1)
fn_ptr();
return 0;
}
If you execute this code using gdb, you obtain this result:
(gdb) l
2 #include <malloc.h>
3
4 typedef int FN(void);
5
6 int main(void)
7 {
8 FN * fn_ptr;
9 char * x;
10
11 fn_ptr = malloc(10240);
12 x = (char *)fn_ptr;
13
14 // ... Insert code into x that points the same memory of fn_ptr;
15 x[0]='\xeb'; x[1]='\xfe'; // jmp $ that is like while(1)
16 fn_ptr();
17
18 return 0;
19 }
(gdb) b 11
Breakpoint 1 at 0x400535: file p.c, line 11.
(gdb) r
Starting program: /home/sergio/a.out
Breakpoint 1, main () at p.c:11
11 fn_ptr = malloc(10240);
(gdb) p fn_ptr
$1 = (FN *) 0x7fffffffde30
(gdb) n
12 x = (char *)fn_ptr;
(gdb) n
15 x[0]='\xeb'; x[1]='\xfe'; // jmp $ that is like while(1)
(gdb) p x[0]
$3 = 0 '\000'
(gdb) n
16 fn_ptr();
(gdb) p x[0]
$5 = -21 '\353'
(gdb) p x[1]
$6 = -2 '\376'
(gdb) s
Program received signal SIGSEGV, Segmentation fault.
0x0000000000602010 in ?? ()
(gdb) where
#0 0x0000000000602010 in ?? ()
#1 0x0000000000400563 in main () at p.c:16
(gdb)
How you see the GDB signals a SIGSEGV, Segmentation fault at the address where fn_ptr points, although the instructions we have into the memory are valid instructions.
Note that the LM Code: EB FE is valid for Intel (or compatible) processor only. This LM Code correspond to the Assembly code: jmp $.
This is an example of use of function pointers where the LM code is copied into a memory area and executed.
The program below doesn't do nothing special! It runs the code that is in the array prg[][] copying it into a memory mapped area. It uses two functions pointer fnI_ptr and fnD_ptr both pointing the same memory area. The program copies the LM code in the memory alternatively one of the two code and then executes the "loaded" code.
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <malloc.h>
#include <sys/mman.h>
#include <stdint.h>
#include <inttypes.h>
typedef int FNi(int,int);
typedef double FNd(double,double);
const char prg[][250] = {
// int multiply(int x,int y)
{
0x55, // push %rbp
0x48,0x89,0xe5, // mov %rsp,%rbp
0x89,0x7d,0xfc, // mov %edi,-0x4(%rbp)
0x89,0x75,0xf8, // mov %esi,-0x8(%rbp)
0x8B,0x45,0xfc, // mov -0x4(%rbp),%eax
0x0f,0xaf,0x45,0xf8, // imul -0x8(%rbp),%eax
0x5d, // pop %rbp
0xc3 // retq
},
// double multiply(double x,double y)
{
0x55, // push %rbp
0x48,0x89,0xe5, // mov %rsp,%rbp
0xf2,0x0f,0x11,0x45,0xf8, // movsd %xmm0,-0x8(%rbp)
0xf2,0x0f,0x11,0x4d,0xf0, // movsd %xmm1,-0x10(%rbp)
0xf2,0x0f,0x10,0x45,0xf8, // movsd -0x8(%rbp),%xmm0
0xf2,0x0f,0x59,0x45,0xf0, // mulsd -0x10(%rbp),%xmm0
0xf2,0x0f,0x11,0x45,0xe8, // movsd %xmm0,-0x18(%rbp)
0x48,0x8b,0x45,0xe8, // mov -0x18(%rbp),%rax
0x48,0x89,0x45,0xe8, // mov %rax,-0x18(%rbp)
0xf2,0x0f,0x10,0x45,0xe8, // movsd -0x18(%rbp),%xmm0
0x5d, // pop %rbp
0xc3 // retq
}
};
int main(void)
{
#define FMT "0x%016"PRIX64
int ret=0;
FNi * fnI_ptr=NULL;
FNd * fnD_ptr=NULL;
void * x=NULL;
//uint64_t p = PAGE(K), l = p*4; //Max memory to use!
uint64_t p = 0, l = 0, line=0; //Max memory to use!
do {
p = getpagesize();line = __LINE__;
if (!p) {
ret=line;
break;
}
l=p*2;
printf("Mem page size = "FMT"\n",p);
printf("Mem alloc size = "FMT"\n\n",l);
x = mmap(NULL, l, PROT_EXEC | PROT_READ | PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);line = __LINE__;
if (x==MAP_FAILED) {
x=NULL;
ret=line;
break;
}
//Prepares function-pointers. They point the same memory! :)
fnI_ptr=(FNi *)x;
fnD_ptr=(FNd *)x;
printf("from x="FMT" to "FMT"\n\n",(int64_t)x,(int64_t)x + l);
// Calling the functions coded into the array prg
puts("Copying prg[0]");
// It injects the function prg[0]
memcpy(x,prg[0],sizeof(prg[0]));
// It executes the injected code
printf("executing int-mul = %d\n",fnI_ptr(10,20));
puts("--------------------------");
puts("Copying prg[1]");
// It injects the function prg[1]
memcpy(x,prg[1],sizeof(prg[1]));
//Prepares function pointers.
// It executes the injected code
printf("executing dbl-mul = %f\n\n",fnD_ptr(12.3,3.21));
} while(0); // Fake loop to be breaked when an error occurs!
if (x!=NULL)
munmap(x,l);
if (ret) {
printf("[line"
"=%d] Error %d - %s\n",ret,errno,strerror(errno));
}
return errno;
}
In prg[][] there're two LM functions:
The first multplies two integer values and returns an integer value as result
The second multiplies two double-precision values and returns a double precision value as result.
I don't discuss about portability. The code into prg[][] was obtained by objdump -S prgname > prgname.s of an object obtained compiling with gcc ( gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4 ) without optimization the following code:
int multiply(int a, int b)
{
return a*b;
}
double dMultiply(double a, double b)
{
return a*b;
}
The above code has been compiled on a PC with an Intel I3 CPU (64 bit) and SO Linux (3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64).
This question already has answers here:
How to get the length of a function in bytes?
(13 answers)
Closed 7 years ago.
I need the location of a code section in the executable (begin and ebn address). I tried to use two dummy functions:
void begin_address(){}
void f(){
...
}
void end_address(){}
...
printf("Function length: %td\n", (intptr_t)end_address - (intptr_t)begin_address);
The problem is, that using -O4 optimization with gcc I got a negative length. It seems that this does not work with optimizations.
I compiled f to assembly, and tried the following:
__asm__(
"func_begin:"
"movq $10, %rax;"
"movq $20, %rbx;"
"addq %rbx, %rax;"
"func_end:"
);
extern unsigned char* func_begin;
extern unsigned char* func_end;
int main(){
printf("Function begin and end address: %p\t%p\n", func_begin, func_end);
printf("Function length: %td\n", (intptr_t)func_end - (intptr_t)func_begin);
}
The problem is that even without optimization I am getting some strange output:
Function begin and end address: 0x480000000ac0c748 0xf5158b48e5894855
Function length: -5974716185612615411
How can I get the location of a function in the executable? My second question is whether referring to this address as const char* is safe or not. I am interested in both 32 and 64 bit solutions if there is a difference.
If you want to see how many bytes a function occupy in a binary, you can use objdump to disassemble the binary to see the first ip and last ip of a function. Or you can print $ebp - $esp if you want to know how many space a function use on stack.
If a viable option for you, tell gcc to compile the needed parts with -O0 instead:
#include <stdio.h>
#include <stdint.h>
void __attribute__((optimize("O0"))) func_begin(){}
void __attribute__((optimize("O0"))) f(){
return;
}
void __attribute__((optimize("O0"))) func_end(){}
int main()
{
printf("Function begin and end address: %p\t%p\n", func_begin, func_end);
printf("Function length: %td\n", (uintptr_t)func_end - (uintptr_t)func_begin);
}
I'm not sure whether __attribute__((optimize("O0"))) is needed for f().
I don't know about GCC, but in the case of some Microsoft compilers, or some versions of Visual Studio, if you build in debug mode, it creates a jump table for function entries that then jump to the actual function. In release mode, it normally doesn't use the jump table.
I thought most linker's have a map output option what would at least show the offsets to functions.
You could use an asm instruction that you could search for:
movel $12345678,eax ;search for this instruction
This worked with Microsoft C / C++ 4.1, VS2005, and VS2010 release builds:
#include <stdio.h>
void swap(char **a, char **b){
char *temp = *a;
*a = *b;
*b = temp;
}
void sortLine(char *a[], int size){
int i, j;
for (i = 0; i < size; i++){
for (j = i + 1; j < size; j++){
if(memcmp(a[i], a[j], 80) > 0){
swap(&a[i], &a[j]);
}
}
}
}
int main(int argc, char **argv)
{
void (*pswap)(char **a, char **b) = swap;
void (*psortLine)(char *a[], int size) = sortLine;
char *pfun1 = (void *) pswap;
char *pfun2 = (void *) psortLine;
printf("%p %p %x\n", pfun1, pfun2, pfun2-pfun1);
return(0);
}
This one is about dereferencing stucture variables in a chain. Please consider this code:
struct ChannelInfo
{
int iData1;
int iData2;
int iData3;
int iData4;
}
struct AppInfo
{
struct ChannelInfo gChanInfo[100];
} gAppInfo;
void main()
{
gAppInfo.gChannelInfo[50].iData1 = 1;
gAppInfo.gChannelInfo[50].iData2 = 2;
gAppInfo.gChannelInfo[50].iData3 = 3;
gAppInfo.gChannelInfo[50].iData4 = 4;
foo1();
foo2();
}
void foo1()
{
printf("Data1 = %d, Data2 = %d, Data3 = %d, Data4 = %d", gAppInfo.gChannelInfo[50].iData1, gAppInfo.gChannelInfo[50].iData2, gAppInfo.gChannelInfo[50].iData3, gAppInfo.gChannelInfo[50].iData4);
}
void foo2()
{
struct ChannelInfo* pCurrrentChan = &gAppInfo.gChanInfo[50];
printf("Data1 = %d, Data2 = %d, Data3 = %d, Data4 = %d", pCurrrentChan->iData1, pCurrrentChan->iData2, pCurrrentChan->iData3, pCurrrentChan->iData4);
}
Is foo2() any faster than foo1()? What happens if the array index was not a constant, being asked for by the user? I would be grateful if someone could profile this code.
this assembly version of your code could help you understand why your code is slower. But of course it could vary depending on the target architecture and you optimization flags ( Commpiling with O2 or O3 flags produce the same code for foo1 and foo2 )
In foo2 the address of ChannelInfo is stored in a register and address are calculated relative to the value stored in the register. Or in the worst case in the stack (local variable ) where in that case it could be as slow as foo1.
In foo1 the variable address for printf are calculated relative to the variable gAppInfo stored in memory heap (or in cache ).
As per #Ludin's request I added these numbers for reference :
Execution of an instruction : 1 ns
fetch from main memory : ~100 ns
assembly version with -O2 flags ( -Os and -O3 flags produce the same code )
Pondering things like this isn't meaningful and it is pre-mature optimization, because the code will get optimized so that both those functions are equivalent.
If you for some reason would not optimize the code, foo2() will be slightly slower because it yields a few more instructions.
Please not that the call to printf is approximately 100 times slower than the rest of the code in that function, so if you are truly concerned about performance you should rather focus on avoiding stdio.h instead of doing these kinds of mini-optimizations.
At the bottom of the answer I have included some benchmarking code for Windows. Because the printf call is so slow compared to the rest of the code, and we aren't really interested in benchmarking printf itself, I removed the printf calls and replaced them with volatile variables. Meaning that the compiler is required to perform the reads no matter level of optimization.
gcc test.c -otest.exe -std=c11 -pedantic-errors -Wall -Wextra -O0
Output:
foo1 5.669101us
foo2 7.178366us
gcc test.c -otest.exe -std=c11 -pedantic-errors -Wall -Wextra -O2
Output:
foo1 2.509606us
foo2 2.506889us
As we can see, the difference in execution time of the non-optimized code corresponds roughly to the number of assembler instructions produced (see the answer by #dvhh).
Unscientifically:
10 / (10 + 16) instructions = 0.384
5.67 / (5.67 + 7.18) microseconds = 0.441
Benchmarking code:
#include <stdlib.h>
#include <stdio.h>
#include <windows.h>
struct ChannelInfo
{
int iData1;
int iData2;
int iData3;
int iData4;
};
struct AppInfo
{
struct ChannelInfo gChannelInfo[100];
} gAppInfo;
void foo1 (void);
void foo2 (void);
static double get_time_diff_us (const LARGE_INTEGER* freq,
const LARGE_INTEGER* before,
const LARGE_INTEGER* after)
{
return ((after->QuadPart - before->QuadPart)*1000.0) / (double)freq->QuadPart;
}
int main (void)
{
/*** Initialize benchmarking functions ***/
LARGE_INTEGER freq;
if(QueryPerformanceFrequency(&freq)==FALSE)
{
printf("QueryPerformanceFrequency not supported");
return 0;
}
LARGE_INTEGER time_before;
LARGE_INTEGER time_after;
gAppInfo.gChannelInfo[50].iData1 = 1;
gAppInfo.gChannelInfo[50].iData2 = 2;
gAppInfo.gChannelInfo[50].iData3 = 3;
gAppInfo.gChannelInfo[50].iData4 = 4;
const size_t ITERATIONS = 1000000;
QueryPerformanceCounter(&time_before);
for(size_t i=0; i<ITERATIONS; i++)
{
foo1();
}
QueryPerformanceCounter(&time_after);
printf("foo1 %fus\n", get_time_diff_us(&freq, &time_before, &time_after));
QueryPerformanceCounter(&time_before);
for(size_t i=0; i<ITERATIONS; i++)
{
foo2();
}
QueryPerformanceCounter(&time_after);
printf("foo2 %fus\n", get_time_diff_us(&freq, &time_before, &time_after));
}
void foo1 (void)
{
volatile int d1, d2, d3, d4;
d1 = gAppInfo.gChannelInfo[50].iData1;
d2 = gAppInfo.gChannelInfo[50].iData2;
d3 = gAppInfo.gChannelInfo[50].iData3;
d4 = gAppInfo.gChannelInfo[50].iData4;
}
void foo2 (void)
{
struct ChannelInfo* pCurrrentChan = &gAppInfo.gChannelInfo[50];
volatile int d1, d2, d3, d4;
d1 = pCurrrentChan->iData1;
d2 = pCurrrentChan->iData2;
d3 = pCurrrentChan->iData3;
d4 = pCurrrentChan->iData4;
}
yes, foo2() is definitely faster than foo1() because foo2 refers a pointer to that memory block and everytime you access it just points there and fetches value from the mmory.
I'm trying to learn how to write gcc inline assembly.
The following code is supposed to perform an shl instruction and return the result.
#include <stdio.h>
#include <inttypes.h>
uint64_t rotate(uint64_t x, int b)
{
int left = x;
__asm__ ("shl %1, %0"
:"=r"(left)
:"i"(b), "0"(left));
return left;
}
int main()
{
uint64_t a = 1000000000;
uint64_t res = rotate(a, 10);
printf("%llu\n", res);
return 0;
}
Compilation fails with error: impossible constraint in asm
The problem is basically with "i"(b). I've tried "o", "n", "m" among others but it still doesn't work. Either its this error or operand size mismatch.
What am I doing wrong?
As written, you code compiles correctly for me (I have optimization enabled). However, I believe you may find this to be a bit better:
#include <stdio.h>
#include <inttypes.h>
uint64_t rotate(uint64_t x, int b)
{
__asm__ ("shl %b[shift], %[value]"
: [value] "+r"(x)
: [shift] "Jc"(b)
: "cc");
return x;
}
int main(int argc, char *argv[])
{
uint64_t a = 1000000000;
uint64_t res = rotate(a, 10);
printf("%llu\n", res);
return 0;
}
Note that the 'J' is for 64bit. If you are using 32bit, 'I' is the correct value.
Other things of note:
You are truncating your rotate value from uint64_t to int? Are you compiling for 32bit code? I don't believe shl can do 64bit rotates when compiled as 32bit.
Allowing 'c' on the input constraint means you can use variable rotate amounts (ie not hard-coded at compile time).
Since shl modifies the flags, use "cc" to let the compiler know.
Using the [name] form makes the asm easier to read (IMO).
The %b is a modifier. See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#i386Operandmodifiers
If you want to really get smart about inline asm, check out the latest gcc docs: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html