Strange assembly output when optimizing string hashing - c

when trying to create a compile-time hash macro, it worked but it had its problems. So I thought if the strings are known at compile time(which they are), the whole hashing should get optimized away to a constant.
This gcc C99 code with optimization level -O3 enabled:
#include <stdio.h>
int main(void)
{
char const *const string = "hello";
int hash = 0;
for (unsigned char i=0; i < sizeof string; ++i)
{
hash += string[i]; //reeaally simple hash :)
}
printf("%i", hash);
return 0;
}
produced the following assembly code:
.LC0:
.string "hello"
.LC1:
.string "%i"
main:
sub rsp, 8
movsx eax, BYTE PTR .LC0[rip+6]
movsx edx, BYTE PTR .LC0[rip+7]
mov edi, OFFSET FLAT:.LC1
lea esi, [rax+532+rdx]
xor eax, eax
call printf
xor eax, eax
add rsp, 8
ret
whilst the same code, I only changed "hello" to "hello w", produces this assembly code, which completely optimized the hashing away:
.LC0:
.string "%i"
main:
sub rsp, 8
mov esi, 683
mov edi, OFFSET FLAT:.LC0
xor eax, eax
call printf
xor eax, eax
add rsp, 8
ret
Try it yourself
What is the reason? Does this mean I can't use this way of hashing because it may be that the overhead won't get optimized out? How can I make sure there won't be any overhead, what are alternatives?
EDIT 1:
I have played around a bit and it seems if the number of chars in the string is 6, it won't get optimized away if the number of chars is 7, it will

sizeof is wrong here. It returns the size of the char pointer not the length of the string.
In your case it is an UB and the compiler cannot optimize it out as you read outside string literal bounds. it is a clang bug not the feature.
if you do it properly gcc will optimize it as well
int main(void)
{
char const string[] = "hello";
int hash = 0;
for (unsigned char i=0; i < sizeof(string); ++i)
{
hash += string[i]; //reeaally simple hash :)
}
printf("%i", hash);
return 0;
}
https://godbolt.org/z/YCCNCt

Related

Static keyword in an array function parameter declaration

Here is the explanation of what it means 6.7.6.3/7:
If the keyword static also appears within the [ and ] of the array
type derivation, then for each call to the function, the value of the
corresponding actual argument shall provide access to the first
element of an array with at least as many elements as specified by the
size expression.
It is not quite clear what it means. I ran the following example:
main.c
#include "func.h"
int main(void){
char test[4] = "123";
printf("%c\n", test_func(2, test));
}
And 2 different implementations of the test_func:
static version
func.h
char test_func(size_t idx, const char[const static 4]);
func.c
char test_func(size_t idx, const char arr[const static 4]){
return arr[idx];
}
non-static version
func.h
char test_func(size_t idx, const char[const 4]);
func.c
char test_func(size_t idx, const char arr[const 4]){
return arr[idx];
}
I checked the assembly code compiled with gcc 7.4.0 -O3 of the function in both of the cases and it turned out to be completely identical:
Disassembly of the functions
(gdb) disas main
sub rsp,0x18
mov edi,0x2
lea rsi,[rsp+0x4]
mov DWORD PTR [rsp+0x4],0x333231
mov rax,QWORD PTR fs:0x28
mov QWORD PTR [rsp+0x8],rax
xor eax,eax
call 0x740 <test_func>
[...]
(gdb) disas test_func
movzx eax,BYTE PTR [rsi+rdi*1]
ret
Can you give an example where the static keyword gives some benefits (or any differences at all) comparing to non-static counterpart?
Here is an example where static actually makes a difference:
unsigned foo(unsigned a[2])
{
return a[0] ? a[0] * a[1] : 0;
}
clang (for x86-64, with -O3) compiles this to
foo:
mov eax, dword ptr [rdi]
test eax, eax
je .LBB0_1
imul eax, dword ptr [rdi + 4]
ret
.LBB0_1:
xor eax, eax
ret
But after replacing the function parameter with unsigned a[static 2], the result is simply
foo:
mov eax, dword ptr [rdi + 4]
imul eax, dword ptr [rdi]
ret
The conditional branch is not necessary because a[0] * a[1] evaluates to the correct result whether a[0] is zero or not. But without the static keyword, the compiler cannot assume that a[1] can be accessed, and thus has to check a[0].
Currently only clang does this optimization; ICC and gcc produce the same code in both cases.
This isn't used much by compilers in my experience, but one use is that the compiler can assume that the (array decayed into pointer) parameter is not NULL.
Given this function, both gcc and clang (x86) produce identical machine code at -O3:
int func (int a[2])
{
if(a)
return 1;
return 0;
}
Disassembly:
func:
xor eax, eax
test rdi, rdi
setne al
ret
When changing the parameter to int a[static 2], gcc gives the same output as before, but clang does a better job:
func:
mov eax, 1
ret
Since clang realizes that a can never be NULL, so it can skip the check.

passing an array of chars to external assembly function

so my question is basic but i had a hard time finding anything on the internet.
lets say i want to write a function in C that calls an external nasm function written in x86_64 assembly.
I want to pass to the external function two char* of numbers, preform some arithmetic operations on the two and return char* of the result. My idea was to iterate over [rdi] and [rsi] somehow and saving the result in rax (i.e add rax, [rdi], [rsi]) but I'm having a hard time to actually do so. what would be the right way to go over each character? increasing [rsi] and [rdi]? and also- I would only need to move to rax the value of the first character right?
Thanks in advance!
If you could post assembly/C code - it would be easier to suggest changes.
For any assembly, I would start with a C code(since I think in C :)) and then convert to assembly using a compiler and then optimize it in the assembly as needed. Assuming you need write a function which takes two strings and adds them and returns the result as int like the following:
int ext_asm_func(unsigned char *arg1, unsigned char *arg2, int len)
{
int i, result = 0;
for(i=0; i<len; i++) {
result += arg1[i] + arg2[i];
}
return result;
}
Here is assembly (generated by gcc https://godbolt.org/g/1N6vBT):
ext_asm_func(unsigned char*, unsigned char*, int):
test edx, edx
jle .L4
lea r9d, [rdx-1]
xor eax, eax
xor edx, edx
add r9, 1
.L3:
movzx ecx, BYTE PTR [rdi+rdx]
movzx r8d, BYTE PTR [rsi+rdx]
add rdx, 1
add ecx, r8d
add eax, ecx
cmp r9, rdx
jne .L3
rep ret
.L4:
xor eax, eax
ret

Is gcc doing a recursive call?

I'm compiling this code with -O3 -x c -std=c99 -fno-builtin -nostdlib -ffreestanding
unsigned char *memset(unsigned char *dest, unsigned char val, int count)
{
unsigned char* p = dest;
while (count--)
*p++ = val;
return dest;
}
#include <stdio.h>
int main()
{
unsigned char c[20];
memset(c, 'a', 19);
c[19] = '\0';
printf((const char*) c);
}
and using godbolt to examine what memset gcc is calling in the assembly output.
memset:
test edx, edx
je .L6
sub edx, 1
sub rsp, 8
movzx esi, sil
add rdx, 1
call memset
add rsp, 8
ret
.L6:
mov rax, rdi
ret
main:
sub rsp, 40
movabs rax, 7016996765293437281
mov QWORD PTR [rsp], rax
mov QWORD PTR [rsp+8], rax
mov eax, 24929
mov WORD PTR [rsp+16], ax
mov rdi, rsp
xor eax, eax
mov BYTE PTR [rsp+18], 97
mov BYTE PTR [rsp+19], 0
call printf
add rsp, 40
ret
With the flags I used I'm attempting to eliminate all possibility of it calling a built-in memset and judging from the colorization godbolt uses, it looks like gcc is doing a recursive call at *p++ = val;. So is it doing recursion or calling builtin memset?
As others have indicated, the setting of the array c elements has been inlined. As a result, the memset() you implemented is not even getting called. This is a result of the use of the -03 compiler option. The compiler is being very aggressive in its optimizations. Furthermore, there is no recursion on the execution path.
However, that does not entirely answer your question. The memset() shown in the disassembled output is indeed NOT the built in version and it is not even being executed.
Incidentally, you do not need to apply the -fno-builtin flag as the -ffreestanding flag automatically implies it. Also, if you enable garbage collection, I am sure that will find that the memset() routine in the disassembled output will vanish.

How do I best use the const keyword in C?

I am trying to get a sense of how I should use const in C code. First I didn't really bother using it, but then I saw a quite a few examples of const being used throughout. Should I make an effort and go back and religiously make suitable variables const? Or will I just be waisting my time?
I suppose it makes it easier to read which variables that are expected to change, especially in function calls, both for humans and the compiler. Am I missing any other important points?
const is typed, #define macros are not.
const is scoped by C block, #define applies to a file (or more strictly, a compilation unit).
const is most useful with parameter passing. If you see const used on a prototype with pointers, you know it is safe to pass your array or struct because the function will not alter it. No const and it can.
Look at the definition for such as strcpy() and you will see what I mean. Apply "const-ness" to function prototypes at the outset. Retro-fitting const is not so much difficult as "a lot of work" (but OK if you get paid by the hour).
Also consider:
const char *s = "Hello World";
char *s = "Hello World";
which is correct, and why?
How do I best use the const keyword in C?
Use const when you want to make it "read-only". It's that simple :)
Using const is not only a good practice but improves the readability and comprehensibility of the code as well as helps prevent some common errors. Definitely do use const where appropriate.
Apart from producing a compiler error when attempting to modify the constant and passing the constant as a non-const parameter, therefore acting as a compiler guard, it also enables the compiler to perform certain optimisations knowing that the value will not change and therefore it can cache the value and not have to read it fresh from memory, because it won't have changed, and it allows it to be immediately substituted in the code.
C const
const and register are basically the opposite of volatile and using volatile will override the const optimisations at file and block scope and the register optimisations at block-scope. const register and register will produce identical outputs because const does nothing on C at block-scope on gcc C -O0, and is redundant on -O1 and onwards, so only the register optimisations apply at -O0, and are redundant from -O1 onwards.
#include<stdio.h>
int main() {
const int i = 1;
printf("%d", i);
}
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 1
mov eax, DWORD PTR [rbp-4] //load from stack isn't eliminated for block-scope consts on gcc C unlike on gcc C++ and clang C, even though value will be the same
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret
In this instance, with -O0, const, volatile and auto all produce the same code, with only register differing c.f.
#include<stdio.h>
const int i = 1;
int main() {
printf("%d", i);
}
i:
.long 1
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
mov eax, DWORD PTR i[rip] //load from memory
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
pop rbp
ret
with const int i = 1; instead:
i:
.long 1
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
mov eax, 1 //saves load from memory, now immediate
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
pop rbp
ret
C++ const
#include <iostream>
int main() {
int i = 1;
std::cout << i;
}
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 1 //stores on stack
mov eax, DWORD PTR [rbp-4] //loads the value stored on the stack
mov esi, eax
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov eax, 0
leave
ret
#include <iostream>
int main() {
const int i = 1;
std::cout << i;
}
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 1 //stores it on the stack
mov esi, 1 //but saves a load from memory here, unlike on C
//'register' would skip this store on the stack altogether
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov eax, 0
leave
ret
#include <iostream>
int i = 1;
int main() {
std::cout << i;
}
i:
.long 1
main:
push rbp
mov rbp, rsp
mov eax, DWORD PTR i[rip] //load from memory
mov esi, eax
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov eax, 0
pop rbp
ret
#include <iostream>
const int i = 1;
int main() {
std::cout << i;
}
main:
push rbp
mov rbp, rsp
mov esi, 1 //eliminated load from memory, now immediate
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov eax, 0
pop rbp
ret
C++ has the extra restriction of producing a compiler error if a const is not initialised (both at file-scope and block-scope). const also has internal linkage as a default on C++. volatile still overrides const and register but const register combines both optimisations on C++.
Even though all the above code is compiled using the default implicit -O0, when compiled with -Ofast, const surprisingly still isn't redundant on C or C++ on clang or gcc for file-scoped consts. The load from memory isn't optimised out unless const is used, even if the file-scope variable isn't modified in the code. https://godbolt.org/z/PhDdxk.

Operand Size Conflict in x86 Assembly

I've just started programming in Assembly for my computer organization course, and I keep getting an operand size conflict error whenever I try to compile this asm block within a C program.
The arrayOfLetters[] object is a char array, so shouldn't each element be one byte? The code works when I do mov eax, arrayOfLetters[1], but I'm not sure why that works, as the eax register is 4 bytes.
#include <stdio.h>
#define SIZE 3
char findMinLetter( char arrayOfLetters[], int arraySize )
{
char min;
__asm{
push eax
push ebx
push ecx
push edx
mov dl, 0x7f // initialize DL
mov al, arrayOfLetters[1] //Problem occurs here
mov min, dl // read DL
pop edx
pop ecx
pop ebx
pop eax
}
return min;
}
int main()
{
char arrayOfLetters[ SIZE ] = {'a','B','c'};
int i;
printf("\nThe original array of letters is:\n\n");
for(i=0; i<SIZE; i++){
printf("%c ", arrayOfLetters[i]);
}
printf("\n\n");
printf("The smallest (potentially capitalized) letter is: %c\n", findMinLetter( arrayOfLetters, SIZE ));
return 0;
}
Use mov al, BYTE PTR arrayOfLetters[1].
You can compile the code with MSVC using cl input.c /Faoutput.asm to get an assembly printout - this would show that simply using arrayOfLetters[1] translates to DWORD PTR and you need to explicity state you want a BYTE PTR.

Resources