long compile time when using big arrays in the extern block - c

Why does gcc take a long time to compile a C code if it has a big array in the extern block?
#define MAXNITEMS 100000000
int buff[MAXNITEMS];
int main (int argc, char *argv[])
{
return 0;
}

I suspect a bug somewhere. There is no reason for the compile to take longer, no matter how big the array is since the compiler will just write an integer into the .bss segment since you never assign a value to an element in it. Proof:
.file "big.c"
.comm buff,4000000000000000000,32
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
.section .note.GNU-stack,"",#progbits
As you can see, the only thing left of the array in the assembly is .comm buff,4000000000000000000,32.
I suggest you gcc with -S to see the assembler code. Maybe your version of GCC has bug. I tested with GCC 4.7.3 and the compile times here are the same, no matter which value I use.
Related: Where are static variables stored (in C/C++)?

Related

Understanding a few of the 'helper' gnu-as directives

I have compiled a program main.c with about two lines of code to see what directives gcc / gas add to the unoptimized assembly file, using:
gcc -o main.s main.c -S
I can look up the concise description of each directive on the gas directive page, but was hoping someone could give a bit more context to some of these directives and what its practical usage is (for example, in gdb or the linker or wherever downstream). Here is the full assembly file with the items in question below:
.file "main.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $4, -8(%rbp)
movl $6, -4(%rbp)
movl -8(%rbp), %edx
movl -4(%rbp), %eax
addl %edx, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
.section .note.GNU-stack,"",#progbits
.file: it seems this is halfway-obsolete based on This statement may go away in future: it is only recognized to be compatible with old as programs.. But given that it is still there, where or how is this currently being used?
.ident: it seems like this gives the same thing as doing gcc --version. Is this used at all beyond giving helper information on the 'gcc' that was used to issue the command, or how is this used?
.section .note...: I have seen .section .text, .section .bss, .section .text, ...but I've never come across a .note, and doing a ctrl-f to search for note doesn't give anything on this page. What is this line doing with the three arguments? And the #progbits ?
.size: given that the directives take up no space, this is giving us the length of the first statement within main -- pushq %rbp minus the last statement ret, which is the length of the main function. But again, what usage is this? Also, it says on the as page that It is only permitted inside .def/.endef pairs., but this isn't inside those pairs, right?
.section .text.startup,"ax",#progbits -- what is text.startup, the ax looks like it means allocatable+executable, but what or where is the text.startup ?

It is possible to convert given C code to Assembly x86?

I'm working in AWD obstacle avoidance robot in assembly x86. I can find out some program which is already executed in C language but can't find executed in assembly x86.
How do convert these C codes to Assembly x86 code?
The whole part of codes here:
http://www.mertarduino.com/arduino-obstacle-avoiding-robot-car-4wd/2018/11/22/
void compareDistance() // find the longest distance
{
if (leftDistance>rightDistance) //if left is less obstructed
{
turnLeft();
}
else if (rightDistance>leftDistance) //if right is less obstructed
{
turnRight();
}
else //if they are equally obstructed
{
turnAround();
}
}
int readPing() { // read the ultrasonic sensor distance
delay(70);
unsigned int uS = sonar.ping();
int cm = uSenter code here/US_ROUNDTRIP_CM;
return cm;
}
How do convert these C codes to Assembly x86 code?
Converting source code to assembly is basically what a compiler does, so just compile it. Most (if not all) compilers have the option of outputting the intermediate assembly code.
If you use gcc -S main.c you will get a file called main.s containing the assembly code.
Here is an example:
$ cat hello.c
#include <stdio.h>
void print_hello() {
puts("Hello World!");
}
int main() {
print_hello();
}
$ gcc -S hello.c
$ cat hello.s
.file "hello.c"
.text
.section .rodata
.LC0:
.string "Hello World!"
.text
.globl print_hello
.type print_hello, #function
print_hello:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rdi
call puts#PLT
nop
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size print_hello, .-print_hello
.globl main
.type main, #function
main:
.LFB1:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0, %eax
call print_hello
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size main, .-main
.ident "GCC: (Debian 8.3.0-6) 8.3.0"
.section .note.GNU-stack,"",#progbits
How do convert these C codes to Assembly x86 code?
You can use the gcc -m32 -S main.c command to do that, where :
the -S flag indicates that the output must be assembly,
the -m32 flag indicates that you want to produce i386 (32-bit) output.

How to run converted .asm code from .c using 'gcc' in Emu8086

I am new here and I converted code from C language to asm. However, it doesn't look like normal code in asm language. So my question is how can I convert a code from C(or C++) language to Assembly language, that the converted asm code could be run on Emu8086.
Here is a simple c code:
#include<stdio.h>
void Hello(){
printf("Hello world");
}
int main (){
Hello();
return 0;
}
Then I converted it with gcc -S test.c and here is the answer:
.file "test1.c"
.section .rodata
.LC0:
.string "Hello world"
.text
.globl Hello
.type Hello, #function
Hello:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rdi
movl $0, %eax
call printf#PLT
nop
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size Hello, .-Hello
.globl main
.type main, #function
main:
.LFB1:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0, %eax
call Hello
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size main, .-main
.ident "GCC: (Debian 6.3.0-18+deb9u1) 6.3.0 20170516"
.section .note.GNU-stack,"",#progbits
Emu8086 does what it says on the tin: it emulates an Intel 8086 processor. The assembly that GCC has produced is for your host machine (since you haven't told it to do otherwise), which evidently uses an x86-64 instructions set. The 8086 can't understand most of these instructions. You need to cross-compile it to an x86 16-bit real-mode executable. The -m16 option on GCC will generate 16-bit code, but it apparently still uses 32-bit registers (EAX, etc.). So you will have to find a compiler that targets the basic 8086 instruction set.

inline vs static inline c

Here are some simple tests run on a x86_64 to show assembler code generated when using inline statement :
TEST 1
static inline void
show_text(void)
{
printf("Hello\n");
}
int main(int argc, char *argv[])
{
show_text();
return 0;
}
And assembler :
gcc -O0 -fno-asynchronous-unwind-tables -S -masm=att main.c && less main.s
.file "main.c"
.text
.section .rodata
.LC0:
.string "Hello"
.text
.type show_text, #function
show_text:
pushq %rbp
movq %rsp, %rbp
leaq .LC0(%rip), %rdi
call puts#PLT
nop
popq %rbp
ret
.size show_text, .-show_text
.globl main
.type main, #function
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
call show_text
movl $0, %eax
leave
ret
.size main, .-main
.ident "GCC: (GNU) 7.3.1 20180312"
.section .note.GNU-stack,"",#progbits
Test 1 result : inline suggestion not taken into account by compiler
Test 2
Same code as test 1, but with -O1 optimization flag
gcc -O1 -fno-asynchronous-unwind-tables -S -masm=att main.c && less main.s
.file "main.c"
.text
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "Hello"
.text
.globl main
.type main, #function
main:
subq $8, %rsp
leaq .LC0(%rip), %rdi
call puts#PLT
movl $0, %eax
addq $8, %rsp
ret
.size main, .-main
.ident "GCC: (GNU) 7.3.1 20180312"
.section .note.GNU-stack,"",#progbits
Test 2 result : no more show_text function defined in assembler
Test 3
show_text not declared as inline, -O1 optimization flag
Test 3 result : no more show_text function defined in assembler, with or without inline : same generated code
Test 4
#include <stdio.h>
static inline void
show_text(void)
{
printf("Hello\n");
printf("Hello\n");
printf("Hello\n");
printf("Hello\n");
printf("Hello\n");
printf("Hello\n");
}
int main(int argc, char *argv[])
{
show_text();
show_text();
return 0;
}
produces :
gcc -O1 -fno-asynchronous-unwind-tables -S -masm=att main.c && less main.s
.file "main.c"
.text
.section .rodata
.LC0:
.string "Hello"
.text
.type show_text, #function
show_text:
pushq %rbp
movq %rsp, %rbp
leaq .LC0(%rip), %rdi
call puts#PLT
leaq .LC0(%rip), %rdi
call puts#PLT
leaq .LC0(%rip), %rdi
call puts#PLT
leaq .LC0(%rip), %rdi
call puts#PLT
leaq .LC0(%rip), %rdi
call puts#PLT
leaq .LC0(%rip), %rdi
call puts#PLT
nop
popq %rbp
ret
.size show_text, .-show_text
.globl main
.type main, #function
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
call show_text
call show_text
movl $0, %eax
leave
ret
.size main, .-main
.ident "GCC: (GNU) 7.3.1 20180312"
.section .note.GNU-stack,"",#progbits
Test 4 result : show_text defined in assembler, inline suggestion not taken into account
I understand inline keyword does not force inlining. But for Test 1 results, what can prevent show_text code replacement in main?
So far, I used to inline some small static functions in my C source code. But from these results it seems quite useless.
Why should I declare some of my small functions static inline when using some modern compilers (and possibly compiling optimized code)?
It is one of those questionable decisions of the C Language Standards people... use of inline does not guarantee a function to be inlined... the keyword only suggests to the compiler that the function could be inlined.
I've had lengthy exchanges on this topic with the ISO WG; this followed a MISRA guideline that requires all inline functions to be declared at module scope using the static keyword. Their logic is that there may be circumstances where the compiler needs to not inline the function... and equally, there may be cases where that non-inlined function needs to have global scope!
IMHO, if a programmer adds the inline keyword, then the suggestion is that they know what they are doing, and that function should be inline.
As you suggest, in its current form, the inline keyword is effectively pointless, unless a compiler treats it seriously.
In your first test you disable optimizations. Inlining is an optimization method. Do not expect it to happen.
Also inline keyword doesn't work nowadays as it used to in the past. I'd say it's only purpose is to have functions in headers without having linker errors about duplicated symbols (when more than one cpp file uses such a header).
Let your compiler do its work. Just enable optimizations (including LTO) and do not worry about details.

Finding out type of assembly language generated by `gcc hello_world.c -S`

hello_world.c
#include <stdio.h>
int main()
{
printf("Hello World\n");
return 0;
}
Running gcc hello_world.c -S generates a hello_world.s file in assembly language.
hello_world.s
.file "hello_world.c"
.section .rodata
.LC0:
.string "Hello World"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $.LC0, %edi
call puts
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",#progbits
Is there some way to find out in what type of assembly language the code was generated in (besides knowing the syntax of all assembly languages.)?
Reference for myself or anyone else who didn't know this:
To get your processor architecture run the following:
uname -p
It is the AT&T syntax for the GNU assembler of the target code's CPU by default. There are options to alter that.

Resources