I wanted to know if there are even faster ways of comparing strings in C than using strcmp(), especially when I have to compare a string with multiple pre-defined strings in a switch statement fashion. In my application, the string to be compared can sometimes go as big as 1000 chars, so was just thinking if strcmp() is sufficient enough or if there exists better and efficient way which I am not familiar with. I am actually working on a low power embedded IoT project where more CPU cycles cost power.
It doesn't sound as if the problem has as much to do with strcmp itself, as how you use it.
The fastest way to compare strings against a table of pre-defined strings, is to ensure that the strings are sorted alphabetically, then use binary search. Where strcmp acts as the comparison function. C standard bsearch may or may not be feasible on an embedded system. Otherwise, it is fairly simple to implement yourself.
That is, unless the number of strings are vast. Then at some point, some manner of hash table will perform better than searching. To give an exact answer of what performs best, one needs all the details of the data.
With fixed-length strings you can improve performance ever so slightly by using memcmp instead - that way you don't have to check against null termination. But that's really a micro-optimization.
Related
Is strcmp slower than strncmp as one can give pre-calculated string length to it, but strcmp does not receive such information ?
I am writing an interpreter. I am aware that these functions are both optimized. I wonder what will be the better approach (in terms of performance), as I will do scan anyway and I will know offset positions hence lengths.
They do different things, so comparing them directly does not make sense. strncmp compares the first n (or fewer, if the string ends sooner) characters of a string. strcmp compares whole strings. If n is sufficiently large that strncmp will compare the whole strings (so that the behavior becomes effectively the same as strcmp) then strncmp is likely to be moderately slower because it also has to keep track of a counter, but the difference might or might not be measurable or even present in a given implementation. For example an implementation of strcmp could just pass SIZE_MAX as the value for n to strncmp.
There is only one way to know: benchmark it. Speculation is of no use.
Be sure to do that with a sufficiently large number of strings and in representative conditions (statistical distribution of string lengths and statistical distribution of matching prefix lengths).
My bet is that there will be no significant difference.
You state that performance is a problem, so let's concentrate on that.
Implementations of library functions vary from compiler vendor to compiler vendor, and also across versions of the same compiler or development environment. Thus, Yves Daoust is correct when he says "there is only one way to know: benchmark it."
I would go further and suggest that if you haven't profiled your code, you start by doing that. The bottlenecks are all too often in surprising places you'd not expect.
It may do some good, however, to compare the implementations of strcmp() and strncmp() if you have the source code.
I once found myself in very nearly the same situation you are in. (Writing a front end information display that used multiple character based terminal backends to do its job. It required repeated near-real-time parsing of several text buffers.) The Borland compiler we were using at the time had an inefficient strncmp(). Since the processor had a length-limited instruction for comparing character buffers, I wrote a specialized variant of strncmp using assembler. "Before and after" benchmarks and profiling revealed we'd removed the primary bottleneck.
Several years later when folks went back to improve and modernize that system, the compiler and its library had changed (and the processors upgraded): there was no longer any real need for the (now obsolete) special version. New benchmarks also revealed that the bottlenecks had moved due to changing compilers, necessitating different optimizations.
I wrote two differents algorithms that resolve some particular case of strings matching (implemented in C). I know that the theoretical O of this algorithms are equals but I think that in practical, one is better than the oder.
My question is, someone could recommend me some paper or some reading where shows how to compare algorithms with a practical approach?
I have several test set, I'm interested in measure execute time and memory size. I need take this values as independently as possible of the operating system and others program that could be runing concurrently.
Thanks!!!
you could compare your algorithms by generating the assembly code and compare them.
You could generate the assembly code with the gcc -S mycode.c command
I find that "looking at the code" is a good start. If one uses more variables and is more complicated than the other, it is probably slower.
However, there are of course clever tricks that can make a more complicated function actually run faster (for example code that reads 8 bytes at a time - but of course, once you find a difference, the code is more complex - for long strings that are largely similar, there is a big win tho').
So, in the end, there is no substitute for actually running the code, using clock-cycle timing (RDTSC instruction on x86 processors, for example), or running a large loop to execute the code many times to give a reasonable length runtime.
If your code isn't supposed to run on a single embedded target, you probably want to run the code on a set of different hardware to determine if the code that is faster on processor A is also faster on B, C and D type processors. Often this does work, but sometimes you can find that a particular processor model is faster for SOME operations, and another is faster for another (for example based on cache-size, etc).
It would also be very important, in the case of string operations, to try with different size inputs, different points of difference (e.g. a long string, but different "early", vs. long string with difference "late"). Sometimes, the different approaches will show different results for short/long strings or early/late point of difference (and of course "equal" strings that are long or short).
In order to complete all comments, I found a book called "A guide to experimental algorithmics" by Catherine C. Mcgeoch Amazon and a profesor recommend me a practical paper pdf.
I need to make a hash table that can eventually be used to write a full assembler.
Basically I will have something like:
foo 100,
and I will need to hash foo and then store the 100 (the address of the command). I was thinking I should just use a 2d array. The second dimension of the array would only be accessed when recording the address (just an int) or when returning the address. There would be no searching done in the second dimension.
If I implement the hash table this way, would it be inefficient? If it is very inefficient, what would be a better way to implement the table?
Edit: I haven't written any code yet. In fact I don't even know what language I'm going to use yet. I want to write it in C so it will be more of a challenge, but I might write it in Java if I feel pressured for time.
If you have every other int in the array unused then in addition to memory waste you're going to use the cache poorly as the cache lines will be underused.
But normally I wouldn't worry about such things when writing an assembler as it's not something very performance demanding as say graphics or heavy computations. At least, I wouldn't rush into optimizing too early.
It is, however, important to keep in mind that once you start assembling large pieces of code (~100,000 lines of assembly) generated automatically (say, from C/C++ code by a compiler), performance will become more and more important as the user experience (wait times) degrades. At that point there will be many candidates for optimization: I/O, parsing, symbol look up, generation of as short as possible jump instructions if they can have multiple encodings for shorter and longer jumps. Expressions and macros will contribute too. You may even consider minimizing white space and comments in the input assembly code in the first place.
Without being able to see any code, there is no reason that this would have to be inefficient. The only reason that it could be is if you pre allocated a bunch of memory that you did not end up using, however without seeing your algorithm you had in mind it is impossible to tell.
I have a number of very large length may be upto 50 digits. I am taking that as string input. However, I need to perform operations on it. So, I need to convert them to a proper base, lets say, 256.
What will be the best algorithm to do so?
Multiple-precision arithmetic (a.k.a. bignums) is a difficult subject, and the good algorithms are non intuitive (there are books about that).
There exist several libraries handling bignums, like e.g. the GMP library (and there are other ones). And most of them take profit from some hardware instructions (e.g. add with carry) with carefully tuned small chunks of assembler code. So they perform better than what you would be able to code in a couple of months.
I strongly recommend using existing bignum libraries. Writing your own would take you years of work, if you want it to be competitive.
See also answers to this question.
I'm trying to evaluate different substring search (ala strstr) algorithms and implementations and looking for some well-crafted needle and haystack strings that will catch worst-case performance and possible corner-case bugs. I suppose I could work them out myself but I figure someone has to have a good collection of test cases sitting around somewhere...
Some thoughts and a partial answer to myself:
Worst case for brute force algorithm:
a^(n+1) b in (a^n b)^m
e.g. aaab in aabaabaabaabaabaabaab
Worst case for SMOA:
Something like yxyxyxxyxyxyxx in (yxyxyxxyxyxyxy)^n. Needs further refinement. I'm trying to ensure that each advancement is only half the length of the partial match, and that maximal suffix computation requires the maximal amount of backtracking. I'm pretty sure I'm on the right track because this type of case is the only way I've found so far to make my implementation of SMOA (which is asymptotically 6n+5) run slower than glibc's Two-Way (which is asymptotically 2n-m but has moderately painful preprocessing overhead).
Worst case for anything rolling-hash based:
Whatever sequence of bytes causes hash collisions with the hash of the needle. For any reasonably-fast hash and a given needle, it should be easy to construct a haystack whose hash collides with the needle's hash at every point. However, it seems difficult to simultaneously create long partial matches, which are the only way to get the worst-case behavior. Naturally for worst-case behavior the needle must have some periodicity, and a way of emulating the hash by adjusting just the final characters.
Worst case for Two-Way:
Seems to be very short needle with nontrivial MS decomposition - something like bac - where the haystack contains repeated false positives in the right-half component of the needle - something like dacdacdacdacdacdacdac. The only way this algorithm can be slow (other than by glibc authors implementing it poorly...) is by making the outer loop iterate many times and repeatedly incur that overhead (and making the setup overhead significant).
Other algorithms:
I'm really only interested in algorithms that are O(1) in space and have low preprocessing overhead, so I haven't looked at their worst cases so much. At least Boyer-Moore (without the modifications to make it O(n)) has a nontrivial worst-case where it becomes O(nm).
Doesn't answer your question directly, but you may find the algorithms in the book - Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology - interesting (has many novel algorithms on sub-string search). Additionally, it is also a good source of special and complex cases.
A procedure that might give interesting statistics, though I have no time to test right now:
Randomize over string length,
then randomize over string contents of that length,
then randomize over offset/length of a substring (possibly something not in the string),
then randomily clobber over the substring (possibly not at all),
repeat.
You can generate container strings (resp., contained test values) recursively by:
Starting with the empty string, generate all strings given by the augmentation of a string currently in the set by adding a character from an alphabet to the left or the right (both).
The alphabet for generating container strings is chosen by you.
You test 2 alphabets for contained strings. One is the one that makes up container strings, the other is its complement.