I need to compare some char * (which I know the length of) with some string literals. Right now I am doing it like this:
void do_something(char * str, int len) {
if (len == 2 && str[0] == 'O' && str[1] == 'K' && str[2] == '\0') {
// do something...
}
}
The problem is that I have many comparisons like this to make and it's quite tedious to break apart and type each of these comparisons. Also, doing it like this is hard to maintain and easy to introduce bugs.
My question is if there is shorthand to type this (maybe a MACRO).
I know there is strncmp and I have seen that GCC optimizes it. So, if the shorthand is to use strncmp, like this:
void do_something(char * str, int len) {
if (len == 2 && strncmp(str, "OK", len) == 0) {
// do something...
}
}
Then, I would like to know it the second example has the same (or better) performance of the first one.
Yes it will. However, your code is not comparing a char * to a string literal. It is comparing two string literals. The compiler is smart enough to spot this and optimize all the code away. Only the code inside the if block remains.
We can see this by looking at the assembly code generated by the comiler:
cc -S -std=c11 -pedantic -O3 test.c
First with your original code...
#include <stdio.h>
#include <string.h>
int main() {
unsigned int len = 2;
char * str = "OK";
if (len == 2 && strncmp(str, "OK", len) == 0) {
puts("Match");
}
}
Then with just the puts.
#include <stdio.h>
#include <string.h>
int main() {
//unsigned int len = 2;
//char * str = "OK";
//if (len == 2 && strncmp(str, "OK", len) == 0) {
puts("Match");
//}
}
The two assembly files are practically the same. No trace of the strings remains, only the puts.
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 10, 14 sdk_version 10, 14
.globl _main ## -- Begin function main
.p2align 4, 0x90
_main: ## #main
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
leaq L_.str(%rip), %rdi
callq _puts
xorl %eax, %eax
popq %rbp
retq
.cfi_endproc
## -- End function
.section __TEXT,__cstring,cstring_literals
L_.str: ## #.str
.asciz "Match"
.subsections_via_symbols
This is a poor place to focus on optimization. String comparison against small strings is very unlikely to be a performance problem.
Furthermore, your proposed optimization is likely slower. You need to get the length of the input string, and that requires walking the full length of the input string. Maybe you need that for other reasons, but its an increasing edge case.
Whereas strncmp can stop as soon as it sees unequal characters. And it definitely only has to read up to the end of the smallest string.
Your example implies that your strings are always NUL terminated. In that case, don't bother getting their length ahead of time, since that involves searching for the NUL. Instead, you can do
memcmp(str, "OK", 3);
This way, the NULs get compared too. If your length is > 2, the result will be > 0 and if it's shorter, the result will be < 0.
This is a single function call, and memcmp is virtually guaranteed to be better optimized than your hand-written code. At the same time, don't bother optimizing unless you find this code to be a bottleneck. Keep in mind also that any benchmark I run on my machine will not necessarily apply to yours.
The only real reason to make this change is for readability.
Related
I wrote a simple function in order to check if malloc works. I create 1 Gb array, fill it with numbers, but the heap does not seem to change. Here is the code:
#include <stdio.h>
#include <assert.h> // For assert()
#include <stdlib.h> // For malloc(), free() and realloc()
#include <unistd.h> // For sleep()
static void create_array_in_heap()
{
double* b;
b = (double*)malloc(sizeof(double) * 1024 * 1024 * 1024);
assert(b != NULL); // Check that the allocation succeeded
int i;
for (i=0; i<1024*1024*1024; i++);
b[i] = 1;
sleep(10);
free(b);
}
int main()
{
create_array_in_heap();
return 0;
}
screenshot of Linux' system monitor
Any ideas why ?
EDIT: a simpler explanation is given in the comments. But my answer applies once the ; has been removed.
An agressive optimizing compiler, such as Clang (Compiler Explorer link), can see that the only important part of your function create_array_in_heap is the call to sleep. The rest has no functional value, since you only fill a memory block to eventually discard it, and is removed by the compiler. This is the entirety of your program compiled by Clang 7.0.0 with -O2:
main: # #main
pushq %rax
movl $10, %edi
callq sleep
xorl %eax, %eax
popq %rcx
retq
In order to benchmark any aspect of a program, the program should have been designed to output a result (computing and discarding the result is too easy for the compiler to optimize into nothing). The result should also be computed from inputs that aren't known at compile-time, otherwise the computation always produces the same result and can be optimized by constant propagation.
Can a C compiler ever optimize a loop by running it?
For example:
int num[] = {1, 2, 3, 4, 5}, i;
for(i = 0; i < sizeof(num)/sizeof(num[0]); i++) {
if(num[i] > 6) {
printf("Error in data\n");
exit(1);
}
}
Instead of running this each time the program is executed, can the compiler simply run this and optimize it away?
Let's have a look… (This really is the only way to tell.)
Fist, I've converted your snippet into something we can actually try to compile and run and saved it in a file named main.c.
#include <stdio.h>
static int
f()
{
const int num[] = {1, 2, 3, 4, 5};
int i;
for (i = 0; i < sizeof(num) / sizeof(num[0]); i++)
{
if (num[i] > 6)
{
printf("Error in data\n");
return 1;
}
}
return 0;
}
int
main()
{
return f();
}
Running gcc -S -O3 main.c produces the following assembly file (in main.s).
.file "main.c"
.section .text.unlikely,"ax",#progbits
.LCOLDB0:
.section .text.startup,"ax",#progbits
.LHOTB0:
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB22:
.cfi_startproc
xorl %eax, %eax
ret
.cfi_endproc
.LFE22:
.size main, .-main
.section .text.unlikely
.LCOLDE0:
.section .text.startup
.LHOTE0:
.ident "GCC: (GNU) 5.1.0"
.section .note.GNU-stack,"",#progbits
Even if you don't know assembly, you'll notice that the string "Error in data\n" is not present in the file so, apparently, some kind of optimization must have taken place.
If we look closer at the machine instructions generated for the main function,
xorl %eax, %eax
ret
We can see that all it does is XOR'ing the EAX register with itself (which always results in zero) and writing that value into EAX. Then it returns again. The EAX register is used to hold the return value. As we can see, the f function was completely optimized away.
Yes. The C compiler unrolls loops automatically with options -O3 and -Otime.
You didn't specify the compiler, but using gcc with -O3 and taking the size calculation outside the for maybe it could do a little adjustment.
Compilers can do even better than that. Not only can compilers examine the effect of running code "forward", but the Standard even allows them to work code logic in reverse in situations involving potential Undefined Behavior. For example, given:
#include <stdio.h>
int main(void)
{
int ch = getchar();
int q;
if (ch == 'Z')
q=5;
printf("You typed %c and the magic value is %d", ch, q);
return 0;
}
a compiler would be entitled to assume that the program will never receive any input which would cause the printf to be reached without q having received a value; since the only input character which would cause q to receive a value would be 'Z', a compiler could thus legitimately replace the code with:
int main(void)
{
getchar();
printf("You typed Z and the magic value is 5");
}
If the user types Z, the behavior of the original program will be well-defined, and the behavior of the latter will match it. If the user types anything else, the original program will invoke Undefined Behavior and, as a consequence, the Standard will impose no requirements on what the compiler may do. A compiler will be entitled to do anything it likes, including producing the same result as would be produced by typing Z.
In Linux kernel code there is a macro used to test bit ( Linux version 2.6.2 ):
#define test_bit(nr, addr) \
(__builtin_constant_p((nr)) \
? constant_test_bit((nr), (addr)) \
: variable_test_bit((nr), (addr)))
where constant_test_bit and variable_test_bit are defined as:
static inline int constant_test_bit(int nr, const volatile unsigned long *addr )
{
return ((1UL << (nr & 31)) & (addr[nr >> 5])) != 0;
}
static __inline__ int variable_test_bit(int nr, const volatile unsigned long *addr)
{
int oldbit;
__asm__ __volatile__(
"btl %2,%1\n\tsbbl %0,%0"
:"=r" (oldbit)
:"m" (ADDR),"Ir" (nr));
return oldbit;
}
I understand that __builtin_constant_p is used to detect whether a variable is compile time constant or unknown. My question is: Is there any performance difference between these two functions when the argument is a compile time constant or not? Why use the C version when it is and use the assembly version when it's not?
UPDATE: The following main function is used to test the performance:
constant, call constant_test_bit:
int main(void) {
unsigned long i, j = 21;
unsigned long cnt = 0;
srand(111)
//j = rand() % 31;
for (i = 1; i < (1 << 30); i++) {
j = (j + 1) % 28;
if (constant_test_bit(j, &i))
cnt++;
}
if (__builtin_constant_p(j))
printf("j is a compile time constant\n");
return 0;
}
This correctly outputs the sentence j is a...
For the other situations just uncomment the line which assigns a "random" number to j and change the function name accordingly. When that line is uncommented the output will be empty, and this is expected.
I use gcc test.c -O1 to compile, and here is the result:
constant, constant_test_bit:
$ time ./a.out
j is compile time constant
real 0m0.454s
user 0m0.450s
sys 0m0.000s
constant, variable_test_bit( omit time ./a.out, same for the following ):
j is compile time constant
real 0m0.885s
user 0m0.883s
sys 0m0.000s
variable, constant_test_bit:
real 0m0.485s
user 0m0.477s
sys 0m0.007s
variable, variable_test_bit:
real 0m3.471s
user 0m3.467s
sys 0m0.000s
I have each version runs several times, and the above results are the typical values of them. It seems the constant_test_bit function is always faster than the variable_test_bit function, no matter whether the parameter is a compile time constant or not... For the last two results( when j is not constant ) the variable version is even dramatically slower than the constant one.
Am I missing something here?
Using godbolt we can do a experiment using of constant_test_bit, the following two test functions are compiled gcc with the -O3 flag:
// Non constant expression test case
int func1(unsigned long i, unsigned long j)
{
int x = constant_test_bit(j, &i) ;
return x ;
}
// constant expression test case
int func2(unsigned long i)
{
int x = constant_test_bit(21, &i) ;
return x ;
}
We see the optimizer is able to optimize the constant expression case to the following:
shrq $21, %rax
andl $1, %eax
while the non-constant expression case ends up as follows:
sarl $5, %eax
andl $31, %ecx
cltq
leaq -8(%rsp,%rax,8), %rax
movq (%rax), %rax
shrq %cl, %rax
andl $1, %eax
So the optimizer is able to produce much better code for the constant expression case and we can see that the non-constant case for constant_test_bit is pretty bad compared to the hand rolled assembly in variable_test_bit and the implementer must believe the constant expression case for constant_test_bit ends up being better than:
btl %edi,8(%rsp)
sbbl %esi,%esi
for most cases.
As to why your test case seems to show a different conclusion is that your test case it is flawed. I have not been able to suss out all the issues. But if we look at this case using constant_test_bit with a non-constant expression we can see the optimizer is able to move all the work outside the look and reduce the work related to constant_test_bit inside the loop to:
movq (%rax), %rdi
even with an older gcc version, but this case may not be relevant to the cases test_bit is being used in. There may be more specific cases where this kind of optimization won't be possible.
Consider the following hypothetical type:
typedef struct Stack {
unsigned long len;
void **elements;
} Stack;
And the following hypothetical macros for dealing with the type (purely for enhanced readability.) In these macros I am assuming that the given argument has type (Stack *) instead of merely Stack (I can't be bothered to type out a _Generic expression here.)
#define stackNull(stack) (!stack->len)
#define stackHasItems(stack) (stack->len)
Why do I not simply use !stackNull(x) for checking if a stack has items? I thought that this would be slightly less efficient (read: not noticeable at all really, but I thought it was interesting) than simply checking stack->len because it would lead to double negation. In the following case:
int thingy = !!31337;
printf("%d\n", thingy);
if (thingy)
doSomethingImportant(thingy);
The string "1\n" would be printed, and It would be impossible to optimize the conditional (well actually, only impossible if the thingy variable didn't have a constant initializer or was modified before the test, but we'll say in this instance that 31337 is not a constant) because (!!x) is guaranteed to be either 0 or 1.
But I'm wondering if compilers will optimize something like the following
int thingy = wellOkaySoImNotAConstantThingyAnyMore();
if (!!thingy)
doSomethingFarLessImportant();
Will this be optimized to actually just use (thingy) in the if statement, as if the if statement had been written as
if (thingy)
doSomethingFarLessImportant();
If so, does it expand to (!!!!!thingy) and so on? (however this is a slightly different question, as this can be optimized in any case, !thingy is !!!!!thingy no matter what, just like -(-(-(1))) = -1.)
In the question title I said "compilers", by that I mean that I am asking if any compiler does this, however I am particularly interested in how GCC will behave in this instance as it is my compiler of choice.
This seems like a pretty reasonable optimization and a quick test using godbolt with this code (see it live):
#include <stdio.h>
void func( int x)
{
if( !!x )
{
printf( "first\n" ) ;
}
if( !!!x )
{
printf( "second\n" ) ;
}
}
int main()
{
int x = 0 ;
scanf( "%d", &x ) ;
func( x ) ;
}
seems to indicate gcc does well, it generates the following:
func:
testl %edi, %edi # x
jne .L4 #,
movl $.LC1, %edi #,
jmp puts #
.L4:
movl $.LC0, %edi #,
jmp puts #
we can see from the first line:
testl %edi, %edi # x
it just uses x without doing any operations on it, also notice the optimizer is clever enough to combine both tests into one since if the first condition is true the other must be false.
Note I used printf and scanf for side effects to prevent the optimizer from optimizing all the code away.
I'm building my project with GCC's -Wconversion warning flag. (gcc (Debian 4.3.2-1.1) 4.3.2) on a 64bit GNU/Linux OS/Hardware. I'm finding it useful in identifying where I've mixed types or lost clarity as to which types should be used.
It's not so helpful in most of the other situations which activate it's warnings and I'm asking how am I meant to deal with these:
enum { A = 45, B, C }; /* fine */
char a = A; /* huh? seems to not warn about A being int. */
char b = a + 1; /* warning converting from int to char */
char c = B - 2; /* huh? ignores this *blatant* int too.*/
char d = (a > b ? b : c) /* warning converting from int to char */
Due to the unexpected results of the above tests (cases a and c) I'm also asking for these differences to be explained also.
Edit: And is it over-engineering to cast all these with (char) to prevent the warning?
Edit2: Some extra cases (following on from above cases):
a += A; /* warning converting from int to char */
a++; /* ok */
a += (char)1; /* warning converting from int to char */
Aside from that, what I'm asking is subjective and I'd like to hear how other people deal with the conversion warnings in cases like these when you consider that some developers advocate removing all warnings.
YAE:
One possible solution is to just use ints instead of chars right? Well actually, not only does it require more memory, it is slower too, as can been demonstrated by the following code. The maths expressions are just there to get the warnings when built with -Wconversion. I assumed the version using char variables would run slower than that using ints due to the conversions, but on my (64bit dual core II) system the int version is slower.
#include <stdio.h>
#ifdef USE_INT
typedef int var;
#else
typedef char var;
#endif
int main()
{
var start = 10;
var end = 100;
var n = 5;
int b = 100000000;
while (b > 0) {
n = (start - 5) + (n - (n % 3 ? 1 : 3));
if (n >= end) {
n -= (end + 7);
n += start + 2;
}
b--;
}
return 0;
}
Pass -DUSE_INT to gcc to build the int version of the above snippet.
When you say /* int */ do you mean it's giving you a warning about that? I'm not seeing any warnings at all in this code with gcc 4.0.1 or 4.2.1 with -Wconversion. The compiler is converting these enums into constants. Since everything is known at compile time, there is no reason to generate a warning. The compiler can optimize out all the uncertainty (the following is Intel with 4.2.1):
movb $45, -1(%rbp) # a = 45
movzbl -1(%rbp), %eax
incl %eax
movb %al, -2(%rbp) # b = 45 + 1
movb $44, -3(%rbp) # c = 44 (the math is done at compile time)
movzbl -1(%rbp), %eax
cmpb -2(%rbp), %al
jle L2
movzbl -2(%rbp), %eax
movb %al, -17(%rbp)
jmp L4
L2:
movzbl -3(%rbp), %eax
movb %al, -17(%rbp)
L4:
movzbl -17(%rbp), %eax
movb %al, -4(%rbp) # d = (a > b ? b : c)
This is without turning on optimizations. With optimizations, it will calculate b and d for you at compile time and hardcode their final values (if it actually needs them for anything). The point is that gcc has already worked out that there can't be a problem here because all the possible values fit in a char.
EDIT: Let me amend this somewhat. There is a possible error in the assignment of b, and the compiler will never catch it, even if it's certain. For example, if b=a+250;, then this will be certain to overflow b but gcc will not issue a warning. It's because the assignment to a is legal, a is a char, and it's your problem (not the compiler's) to make sure that math doesn't overflow at runtime.
Maybe the compiler can already see that all the values fit into a char so it doesn't bother warning. I'd expect the enum to be resolved right at the beginning of the compilation.