Is strcmp slower than strncmp as one can give pre-calculated string length to it, but strcmp does not receive such information ?
I am writing an interpreter. I am aware that these functions are both optimized. I wonder what will be the better approach (in terms of performance), as I will do scan anyway and I will know offset positions hence lengths.
They do different things, so comparing them directly does not make sense. strncmp compares the first n (or fewer, if the string ends sooner) characters of a string. strcmp compares whole strings. If n is sufficiently large that strncmp will compare the whole strings (so that the behavior becomes effectively the same as strcmp) then strncmp is likely to be moderately slower because it also has to keep track of a counter, but the difference might or might not be measurable or even present in a given implementation. For example an implementation of strcmp could just pass SIZE_MAX as the value for n to strncmp.
There is only one way to know: benchmark it. Speculation is of no use.
Be sure to do that with a sufficiently large number of strings and in representative conditions (statistical distribution of string lengths and statistical distribution of matching prefix lengths).
My bet is that there will be no significant difference.
You state that performance is a problem, so let's concentrate on that.
Implementations of library functions vary from compiler vendor to compiler vendor, and also across versions of the same compiler or development environment. Thus, Yves Daoust is correct when he says "there is only one way to know: benchmark it."
I would go further and suggest that if you haven't profiled your code, you start by doing that. The bottlenecks are all too often in surprising places you'd not expect.
It may do some good, however, to compare the implementations of strcmp() and strncmp() if you have the source code.
I once found myself in very nearly the same situation you are in. (Writing a front end information display that used multiple character based terminal backends to do its job. It required repeated near-real-time parsing of several text buffers.) The Borland compiler we were using at the time had an inefficient strncmp(). Since the processor had a length-limited instruction for comparing character buffers, I wrote a specialized variant of strncmp using assembler. "Before and after" benchmarks and profiling revealed we'd removed the primary bottleneck.
Several years later when folks went back to improve and modernize that system, the compiler and its library had changed (and the processors upgraded): there was no longer any real need for the (now obsolete) special version. New benchmarks also revealed that the bottlenecks had moved due to changing compilers, necessitating different optimizations.
Related
I found that memcmp() will return false earlier if the first byte is different in both strings, and I thought it has a timing attack risk. However, when I tried to find out if there were other functions that had side-channel risks like memcmp, I couldn't find any information.
Yes. strcmp and friends all work the same way. If in the rare case you are timing attack sensitive you have to write all your own comparison loops. The compiler can quite often optimize them back into timing sensitive loops now too, so you end up compiling such files with -O0. I know, so sad.
Typically you don't have this problem because you compare hashes.
I wanted to know if there are even faster ways of comparing strings in C than using strcmp(), especially when I have to compare a string with multiple pre-defined strings in a switch statement fashion. In my application, the string to be compared can sometimes go as big as 1000 chars, so was just thinking if strcmp() is sufficient enough or if there exists better and efficient way which I am not familiar with. I am actually working on a low power embedded IoT project where more CPU cycles cost power.
It doesn't sound as if the problem has as much to do with strcmp itself, as how you use it.
The fastest way to compare strings against a table of pre-defined strings, is to ensure that the strings are sorted alphabetically, then use binary search. Where strcmp acts as the comparison function. C standard bsearch may or may not be feasible on an embedded system. Otherwise, it is fairly simple to implement yourself.
That is, unless the number of strings are vast. Then at some point, some manner of hash table will perform better than searching. To give an exact answer of what performs best, one needs all the details of the data.
With fixed-length strings you can improve performance ever so slightly by using memcmp instead - that way you don't have to check against null termination. But that's really a micro-optimization.
I wrote two differents algorithms that resolve some particular case of strings matching (implemented in C). I know that the theoretical O of this algorithms are equals but I think that in practical, one is better than the oder.
My question is, someone could recommend me some paper or some reading where shows how to compare algorithms with a practical approach?
I have several test set, I'm interested in measure execute time and memory size. I need take this values as independently as possible of the operating system and others program that could be runing concurrently.
Thanks!!!
you could compare your algorithms by generating the assembly code and compare them.
You could generate the assembly code with the gcc -S mycode.c command
I find that "looking at the code" is a good start. If one uses more variables and is more complicated than the other, it is probably slower.
However, there are of course clever tricks that can make a more complicated function actually run faster (for example code that reads 8 bytes at a time - but of course, once you find a difference, the code is more complex - for long strings that are largely similar, there is a big win tho').
So, in the end, there is no substitute for actually running the code, using clock-cycle timing (RDTSC instruction on x86 processors, for example), or running a large loop to execute the code many times to give a reasonable length runtime.
If your code isn't supposed to run on a single embedded target, you probably want to run the code on a set of different hardware to determine if the code that is faster on processor A is also faster on B, C and D type processors. Often this does work, but sometimes you can find that a particular processor model is faster for SOME operations, and another is faster for another (for example based on cache-size, etc).
It would also be very important, in the case of string operations, to try with different size inputs, different points of difference (e.g. a long string, but different "early", vs. long string with difference "late"). Sometimes, the different approaches will show different results for short/long strings or early/late point of difference (and of course "equal" strings that are long or short).
In order to complete all comments, I found a book called "A guide to experimental algorithmics" by Catherine C. Mcgeoch Amazon and a profesor recommend me a practical paper pdf.
I have always implemented the c code for reversing a string as:
looping I till the length of the string or till the half of its length
placing the pointers at the end and beginning of the string
swapping them one by one .
but I want an optimized code that reduces the time complexity for this problem apart from the one that I mentioned. I tried google search but did not find any relevant solution to it.
If by "time complexity" you're referring to the big-O notation which excludes coefficients and lower-order terms, you will not be able to beat a simple O(n) algorithm for reversing a C string.
If you're referring to the time it takes for a specific machine (or class of machines) to execute the operation, there is a number of approaches to optimize the reversal. Typical optimizations include loop unrolling, consuming the characters machine word by machine word instead of character by character, and a smart search for the terminating NUL character. The freely available GNU libc contains examples of such optimizations.
Some of the above optimizations, such as loop unrolling, may be automatically implemented by optimizing compilers. Others may be counter-productive on some platforms, or their speedup dependent on the size of the string. In some cases hand-written optimization can hinder the compiler's own effort to optimize the code. The only way to be sure you're not making things worse, develop a benchmark that covers your intended usage and meticulously benchmark your code as you progress.
for(fctr=0,bctr=len-1;fctr<len/2;fctr++,bctr--)
{
temp=str[fctr];
str[fctr]=str[bctr];
str[bctr]=temp;
}
this may work fast!
I have a file that's 21056 bytes.
I've written a program in C that reads the entire file into a buffer, and then uses multiple search algorithms to search the file for a token that's 82 chars.
I've used all the implementations of the algorithms from the “Exact String Matching Algorithms” page. I've used: KMP, BM, TBM, and Horspool. And then I used strstr and benchmarked each one.
What I'm wondering is, each time the strstr outperforms all the other algorithms. The only one that is faster sometimes is BM.
Shouldn't strstr be the slowest?
Here's my benchmark code with an example of benchmarking BM:
double get_time()
{
LARGE_INTEGER t, f;
QueryPerformanceCounter(&t);
QueryPerformanceFrequency(&f);
return (double)t.QuadPart/(double)f.QuadPart;
}
before = get_time();
BM(token, strlen(token), buffer, len);
after = get_time();
printf("Time: %f\n\n", after - before);
Could someone explain to me why strstr is outperforming the other search algorithms? I'll post more code on request if needed.
Why do you think strstr should be slower than all the others? Do you know what algorithm strstr uses? I think it's quite likely that strstr uses a fine-tuned, processor-specific, assembly-coded algorithm of the KMP type or better. In which case you don't stand a chance of out-performing it in C for such small benchmarks.
(The reason I think this is likely is that programmers love to implement such things.)
Horspool, KMP et al are optimal at minimizing the number of byte-comparisons.
However, that's not the bottleneck on a modern processor. On an x86/64 processor, your string is being loaded into L1 cache in cache-line-width chunks (typically 64 bytes). No matter how clever your algorithm is, unless it gives you strides that are larger than that, you gain nothing; and the more complicated Horspool code is (at least one table lookup) can't compete.
Furthermore, you're stuck with the "C" string constraint of null-termination: SOMEWHERE the code has to examine every byte.
strstr() is expected to be optimal for a wide range of cases; e.g. searching for tiny strings like "\r\n" in a short string, as well as much longer ones where some smarter algorithm might have a hope. The basic strchr/memcmp loop is pretty hard to beat across the whole range of likely inputs.
Pretty much all x86-compatible processors since 2003 have supported SSE2. If you disassembled strlen()/x86 for glibc, you may have noticed that it uses some SSE2 PCMPEQ and MOVMASK operations to search for the null terminator 16 bytes at a time. The solution is so efficient that it beats the obvious super-simple loop, for anything longer than the empty string.
I took that idea and came up with a strstr() that beats glibc's strstr() for all cases greater than 1 byte --- where the relative difference is pretty much moot. If you're interested, check out:
Convergence SSE2 and strstr()
A better strstr() with no ASM code
If you care to see a non-SSE2 solution that dominates strstr() for target strings of more than 15 bytes, check out:
which makes use of multibyte comparisons rather than strchr(), to find a point at which to do a memcmp.
BTW you've probably figured out by now that the x86 REP SCASB/REP CMPSB ops fall on their ass for anything longer than 32 bytes, and are not much improvement for shorter strings. Wish Intel had devoted a little more attention to that, than to adding SSE4.2 "string" ops.
For strings large enough to matter, my perf tests show that BNDM is better than Horspool, across the board. BNDM is more tolerant of "pathological" cases, such as targets that heavily repeat the last byte of a pattern. BNDM can also make use of SSE2 (128bit registers) in a way that competes with 32bit registers in efficiency and start-up cost. Source code here.
Without seeing your code, it's hard to say exactly. strstr is heavily optimized, and usually written in assembly language. It does things like reading data 4 bytes at a time and comparing them (bit-twiddling if necessary if the alignment isn't right) to minimize memory latency. It can also take advantage of things like SSE to load 16 bytes at a time. If your code is only loading one byte at a time, it's probably getting killed by memory latency.
Use your debugger and step through the disassembly of strstr -- you'll probably find some interesting things in there.
Imagine you want to get something cleaned. You could just clean it yourself, or you could hire ten professional cleaners to clean it. If the cleaning job is an office building, the latter solution would be preferable. If the cleaning job was one window, the former would be preferable.
You never get any payback for the time spent setting up to do the job efficiently because the job doesn't take very long.