I want to know how bitwise shift operator "<<" and ">>" is implemented in language. Is it atomic or not? Does c shift the whole word at once or move every bit one by one.
Are there any dependencies on the compiler, operating system or computer architecture?
Does C standard defines how shift operator would be implemented?
Example :
let's say two thread are accessing a data. one of them modifies it by shifting 3 bits. so does this 3 bit shift an atomic operation or not? should I use locks to handle this modification?
EDIT: It's only a shift operator, no store instruction. data is already in memory so no load operation.
My processor : Powerpc MPC8569, e600 core architecture.
C only guarantees atomic access for _Atomic type variables, which were introduced in C11.
For all other situations, there are never any guarantees of atomic access. You will have to disassemble to C code to see how many assembler instructions it generated. Typically, one assembler instruction is always atomic.
But your question doesn't make all that much sense, because there is no context. Where would the result of the shift go? Do you plan to store it somewhere? Then that's two operations: shift and store. Possibly also a load. If you write an algorithm which is not atomic in itself, how do you expect the compiler to magically make it atomic for you?
It depends on the processor that you are using.
If an instruction for bitwise shift is present, as is present on most x86 cores and 16 bit and 32 bit microcontrollers, then it is atomic.
If, however you have a 8 bit microcontroller without a bit shift instruction, or you are trying to bit shift a large value (say 64 bits or 128 bits) the instruction may well take quite a lot of code.
It depends upon which standard you are talking about.
AFAIU, the only atomic operations (explicitly defined as atomic) in C11 are the ones related to <stdatomic.h>
You could imagine a TeraHertz processor with a 4 bits ALU; even a simple int32_t addition won't be atomic on it.
I wrote two program
#include<stdio.h>
int main()
{
int i = 5;
return 0;
}
Its assembly code generated for PowerPC architecture Code 1 is
.file "hello.c"
.section ".text"
.align 2
.globl main
.type main, #function
main:
stwu 1,-32(1)
stw 31,28(1)
mr 31,1
li 0,5
stw 0,8(31)
li 0,0
mr 3,0
lwz 11,0(1)
lwz 31,-4(11)
mr 1,11
blr
.size main, .-main
.ident "GCC: (GNU) 4.2.2"
.section .note.GNU-stack,"",#progbits
Second code is
#include<stdio.h>
int main()
{
int i = 5;
i = i<<1;
return 0;
}
My assembly code generated for PowerPC architecture code 2 is
.file "hello.c"
.section ".text"
.align 2
.globl main
.type main, #function
main:
stwu 1,-32(1)
stw 31,28(1)
mr 31,1
li 0,5
stw 0,8(31)
lwz 0,8(31) // extra
slwi 0,0,1 // extra
stw 0,8(31) // extra
li 0,0
mr 3,0
lwz 11,0(1)
lwz 31,-4(11)
mr 1,11
blr
.size main, .-main
.ident "GCC: (GNU) 4.2.2"
.section .note.GNU-stack,"",#progbits
You see there are three extra Instruction, So operation is not atomic
I also compiled this on Intel i7 PC. Here are the results:
assembly code generated for first code is :
.file "hello.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $5, -4(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4"
.section .note.GNU-stack,"",#progbits
Assembly code generated for code 2:
.file "hello.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $5, -4(%rbp)
sall -4(%rbp) // only one extra instruction
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4"
.section .note.GNU-stack,"",#progbits
So, my understanding is that, answer depends on architecture we are using.
Normally it should be SHLD/SHRD - Double Precision Shift (386+)
https://web.itu.edu.tr/kesgin/mul06/intel/instr/shld_shrd.html
I think it's atomic because it's a single instruction. Otherwise you could use atomic or volatile if c supports it, C++11 supports it.
Related
I've been trying to get familiar with assembly on mac, and from what I can tell, the documentation is really sparse, and most books on the subject are for windows or linux. I thought I would be able to translate from linux to mac pretty easily, however this (linux)
.file "simple.c"
.text
.globl simple
.type simple, #function
simple:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %edx
movl 12(%ebp), %eax
addl (%edx), %eax
movl %eax, (%edx)
popl %ebp
ret
.size simple, .-simple
.ident "GCC: (Ubuntu 4.3.2-1ubuntu11) 4.3.2"
.section .note.GNU-stack,"",#progbits
seems pretty different from this (mac)
.section __TEXT,__text,regular,pure_instructions
.globl _simple
.align 4, 0x90
_simple: ## #simple
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
addl (%rdi), %esi
movl %esi, (%rdi)
movl %esi, %eax
popq %rbp
ret
.cfi_endproc
.subsections_via_symbols
The "normal" (for lack of a better word) instructions and registers such as pushq %rbp don't worry me. But the "weird" ones like .cfi_startproc and Ltmp2: which are smack dab in the middle of the machine instructions don't make any sense.
I have no idea where to go to find out what these are and what they mean. I'm about to pull my hair out as I've been trying to find a good resource for beginners for months. Any suggestions?
To begin with, you're comparing 32-bit x86 assembly with 64-bit x86-64. While the OS X Mach-O ABI supports 32-bit IA32, I suspect you want the x86-64 SysV ABI. (Thankfully, the x86-64.org site seems to be up again). The Mach-O x86-64 model is essentially a variant of the ELF / SysV ABI, so the differences are relatively minor for user-space code, even with different assemblers.
The .cfi directives are DWARF debugging directives that you don't strictly need for assembly - they are used for call frame information, etc. Here are some minimal examples:
ELF x64-64 assembler:
.text
.p2align 4
.globl my_function
.type my_function,#function
my_function:
...
.L__some_address:
.size my_function,[.-my_function]
Mach-O x86-64 assembler:
.text
.p2align 4
.globl _my_function
_my_function:
...
L__some_address:
Short of writing an asm tutorial, the main differences between the assemblers are: leading underscores for Mach-O functions names, .L vs L for labels (destinations). The assembler with OS X understands the '.p2align' directive. .align 4, 0x90 essentially does the same thing.
Not all the directives in compiler-generated code are essential for the assembler to generate valid object code. They are required to generate stack frame (debugging) and exception handling data. Refer to the links for more information.
Obviously the Linux code is 32-bit Linux code. Note that 64-bit Linux can run both 32- and 64-bit code!
The Mac code is definitely 64-bit code.
This is the main difference.
The ".cfi_xxx" lines are only information used for the Mac specific file format.
I have compiled a program main.c with about two lines of code to see what directives gcc / gas add to the unoptimized assembly file, using:
gcc -o main.s main.c -S
I can look up the concise description of each directive on the gas directive page, but was hoping someone could give a bit more context to some of these directives and what its practical usage is (for example, in gdb or the linker or wherever downstream). Here is the full assembly file with the items in question below:
.file "main.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $4, -8(%rbp)
movl $6, -4(%rbp)
movl -8(%rbp), %edx
movl -4(%rbp), %eax
addl %edx, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
.section .note.GNU-stack,"",#progbits
.file: it seems this is halfway-obsolete based on This statement may go away in future: it is only recognized to be compatible with old as programs.. But given that it is still there, where or how is this currently being used?
.ident: it seems like this gives the same thing as doing gcc --version. Is this used at all beyond giving helper information on the 'gcc' that was used to issue the command, or how is this used?
.section .note...: I have seen .section .text, .section .bss, .section .text, ...but I've never come across a .note, and doing a ctrl-f to search for note doesn't give anything on this page. What is this line doing with the three arguments? And the #progbits ?
.size: given that the directives take up no space, this is giving us the length of the first statement within main -- pushq %rbp minus the last statement ret, which is the length of the main function. But again, what usage is this? Also, it says on the as page that It is only permitted inside .def/.endef pairs., but this isn't inside those pairs, right?
.section .text.startup,"ax",#progbits -- what is text.startup, the ax looks like it means allocatable+executable, but what or where is the text.startup ?
I'm trying to understand how does the compiler "sees" the i+1 part from expression i=i+1. I understand that i=3 means putting the value 3 in the location memory of variable i.
My guess about the i=i+1 is that the compiler expects a value from the right side of the "=" operator, so it gets the value from the location memory of variable i (which is 3, after the assignment) and add 1 to it, and the final result of the "i+1" expression(3+1=4) is stored back into the location memory of variable i, as a value. Is that correct?
And if it is, it means that any variable/combination of variables and literals present on the right side of an "=" operator will always be replaced with the value stored in them and those value can be added/substracted/etc with the values from other variables/literals (as in the x+1 expression), whilst the final result of those calculations will also be literal values (ex: 5, literal strings, etc), and will also be stored like values in a single variable on the left side of the "=" operator.
I'm also curious how this code is seen in assembly, and what are the main operations of this incrementation of i ( i = i+1);
#include <stdio.h>
int main()
{
int i = 3;
i = i + 1; // i should have the value of 4 stored back in it;
return 0;
}
This is not answerable for the general case. It depends on the target platform. If you want to inspect the assembly, you can do so with the -S parameter with gcc. When I did that to your code, it gave me this:
/tmp$ cat main.s
.file "main.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $3, -4(%rbp)
addl $1, -4(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Debian 9.2.1-8) 9.2.1 20190909"
.section .note.GNU-stack,"",#progbits
A brief little explanation of what is happening here. First we push the value of the stackpointer. This is so that we can jump back later.
.cfi_startproc
pushq %rbp
Then we set up the stack frame with this code. It corresponds to declaring variables.
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
Then we have this. Comments are mine.
movl $3, -4(%rbp) # i = 3;
addl $1, -4(%rbp) # i = i + 1;
Lastly, we return from the main function
movl $0, %eax # Store 0 in the "return register"
popq %rbp # Restore stackpointer
.cfi_def_cfa 7, 8
ret # return
Note that there is not a 1-1 relationship between lines. Not even for very simple lines.
Please also note that C imposes requirement on the observable behavior of the program and not on the generated assembly. So for instance, a compiler might remove the whole body for the main function because the variable i is not used in an observable way. And it will if you use optimization. When I recompiled your code with -O3 I got this instead:
/tmp/$ cat main.s
.file "main.c"
.text
.section .text.startup,"ax",#progbits
.p2align 4
.globl main
.type main, #function
main:
.LFB11:
.cfi_startproc
xorl %eax, %eax
ret
.cfi_endproc
.LFE11:
.size main, .-main
.ident "GCC: (Debian 9.2.1-8) 9.2.1 20190909"
.section .note.GNU-stack,"",#progbits
Notice how much that got removed from main. It can be interesting that movl $0, %eax has changed to xorl %eax, %eax. If you think about it, it's pretty obvious that this is a "set zero" operation. One could reasonably argue why anyone would write stuff like that. Well, the optimizer does certainly not optimize for readability. There are a few reasons why it is better. You can read about them here: What is the best way to set a register to zero in x86 assembly: xor, mov or and?
I was putting together a C riddle for a couple of my friends when a friend drew my attention to the fact that the following snippet (which happens to be part of the riddle I'd been writing) ran differently when compiled and run on OSX
#include <stdio.h>
#include <string.h>
int main()
{
int a = 10;
volatile int b = 20;
volatile int c = 30;
int data[3];
memcpy(&data, &a, sizeof(data));
printf("%d %d %d\n", data[0], data[1], data[2]);
}
What you'd expect the output to be is 10 20 30, which happens to be the case under Linux, but when the code is built under OSX you'd get 10 followed by two random numbers. After some debugging and looking at the compiler-generated assembly I came to the conclusion that this is due to how the stack is built. I am by no means an assembly expert, but the assembly code generated on Linux seems pretty straightforward to understand while the one generated on OSX threw me off a little. Perhaps I could use some help from here.
This is the code that was generated on Linux:
.file "code.c"
.section .text.unlikely,"ax",#progbits
.LCOLDB0:
.section .text.startup,"ax",#progbits
.LHOTB0:
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB23:
.cfi_startproc
movl $10, -12(%rsp)
xorl %eax, %eax
movl $20, -8(%rsp)
movl $30, -4(%rsp)
ret
.cfi_endproc
.LFE23:
.size main, .-main
.section .text.unlikely
.LCOLDE0:
.section .text.startup
.LHOTE0:
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609"
.section .note.GNU-stack,"",#progbits
And this is the code that was generated on OSX:
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 12
.globl _main
.p2align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp0:
.cfi_def_cfa_offset 16
Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp2:
.cfi_def_cfa_register %rbp
subq $16, %rsp
movl $20, -8(%rbp)
movl $30, -4(%rbp)
leaq L_.str(%rip), %rdi
movl $10, %esi
xorl %eax, %eax
callq _printf
xorl %eax, %eax
addq $16, %rsp
popq %rbp
retq
.cfi_endproc
.section __TEXT,__cstring,cstring_literals
L_.str: ## #.str
.asciz "%d %d %d\n"
.subsections_via_symbols
I'm really only interested in two questions here.
Why is this happening?
Are there any get-arounds to this issue?
I know this is not a practical way to utilize the stack as I'm a professional C developer, which is really the only reason I found this problem interesting to invest some of my time into.
Accessing memory past the end of a declared variable is undefined behaviour - there is no guarantee as to what will happen when you try to do that. Because of how the compiler generated the assembly under Linux, you happened to get the 3 variables directly in a row on the stack, however that behaviour is just a coincidence - the compiler could legally add extra data in between the variables on the stack or really do anything - the result is not defined by the language standard. So in answer to your first question, it's happening because what you're doing is not part of the language by design. In answer to your second, there's no way to reliably get the same result from multiple compilers because the compilers are not programmed to reliably reproduce undefined behaviour.
undefined behavior. You don't expect to copy 10, 20 ,30. You hope not to seg-fault.
There is nothing to guarantee that a,b, and c are sequential memory addresses, which is your naive assumption. On Linux, the compiler happened to make them sequential. You can't even rely on gcc always doing that.
You already know that the behavior is undefined. A good reason for the behavior to be different on OS/X and Linux is these systems use a different compiler, that generates different code:
When you run gcc in Linux, you invoke the installed version the Gnu C compiler.
When you run gcc in your version of OS/X, you most likely invoke the installed version of clang.
Try gcc --version on both systems and amaze your friends.
Why does gcc take a long time to compile a C code if it has a big array in the extern block?
#define MAXNITEMS 100000000
int buff[MAXNITEMS];
int main (int argc, char *argv[])
{
return 0;
}
I suspect a bug somewhere. There is no reason for the compile to take longer, no matter how big the array is since the compiler will just write an integer into the .bss segment since you never assign a value to an element in it. Proof:
.file "big.c"
.comm buff,4000000000000000000,32
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
.section .note.GNU-stack,"",#progbits
As you can see, the only thing left of the array in the assembly is .comm buff,4000000000000000000,32.
I suggest you gcc with -S to see the assembler code. Maybe your version of GCC has bug. I tested with GCC 4.7.3 and the compile times here are the same, no matter which value I use.
Related: Where are static variables stored (in C/C++)?