optimized code for reversing a string in c

optimized code for reversing a string in c - c

I have always implemented the c code for reversing a string as:
looping I till the length of the string or till the half of its length
placing the pointers at the end and beginning of the string
swapping them one by one .
but I want an optimized code that reduces the time complexity for this problem apart from the one that I mentioned. I tried google search but did not find any relevant solution to it.

If by "time complexity" you're referring to the big-O notation which excludes coefficients and lower-order terms, you will not be able to beat a simple O(n) algorithm for reversing a C string.
If you're referring to the time it takes for a specific machine (or class of machines) to execute the operation, there is a number of approaches to optimize the reversal. Typical optimizations include loop unrolling, consuming the characters machine word by machine word instead of character by character, and a smart search for the terminating NUL character. The freely available GNU libc contains examples of such optimizations.
Some of the above optimizations, such as loop unrolling, may be automatically implemented by optimizing compilers. Others may be counter-productive on some platforms, or their speedup dependent on the size of the string. In some cases hand-written optimization can hinder the compiler's own effort to optimize the code. The only way to be sure you're not making things worse, develop a benchmark that covers your intended usage and meticulously benchmark your code as you progress.

for(fctr=0,bctr=len-1;fctr<len/2;fctr++,bctr--)
{
temp=str[fctr];
str[fctr]=str[bctr];
str[bctr]=temp;
}
this may work fast!

Related

Are string operations inefficient because they access characters one at a time?

I heard that
the string operations (e.g. strlen()) in the C standard library access and operate on the characters of a string, one character at a time.
Computers access a word in memory at a time.
Accessing a character at a time is inefficient and the time costs by the string operations are high.
Are the above true?
What solutions can be used for improving the time performance of string operations?

The assumption in the question is false. Optimized implementations of strlen and other string operations in fact work word-at-a-time.
The GNU C Library ("glibc") has hand-optimized assembly routines for this, such as this one for x86_64.

Is strncmp faster than strcmp

Is strcmp slower than strncmp as one can give pre-calculated string length to it, but strcmp does not receive such information ?
I am writing an interpreter. I am aware that these functions are both optimized. I wonder what will be the better approach (in terms of performance), as I will do scan anyway and I will know offset positions hence lengths.

They do different things, so comparing them directly does not make sense. strncmp compares the first n (or fewer, if the string ends sooner) characters of a string. strcmp compares whole strings. If n is sufficiently large that strncmp will compare the whole strings (so that the behavior becomes effectively the same as strcmp) then strncmp is likely to be moderately slower because it also has to keep track of a counter, but the difference might or might not be measurable or even present in a given implementation. For example an implementation of strcmp could just pass SIZE_MAX as the value for n to strncmp.

There is only one way to know: benchmark it. Speculation is of no use.
Be sure to do that with a sufficiently large number of strings and in representative conditions (statistical distribution of string lengths and statistical distribution of matching prefix lengths).
My bet is that there will be no significant difference.

You state that performance is a problem, so let's concentrate on that.
Implementations of library functions vary from compiler vendor to compiler vendor, and also across versions of the same compiler or development environment. Thus, Yves Daoust is correct when he says "there is only one way to know: benchmark it."
I would go further and suggest that if you haven't profiled your code, you start by doing that. The bottlenecks are all too often in surprising places you'd not expect.
It may do some good, however, to compare the implementations of strcmp() and strncmp() if you have the source code.
I once found myself in very nearly the same situation you are in. (Writing a front end information display that used multiple character based terminal backends to do its job. It required repeated near-real-time parsing of several text buffers.) The Borland compiler we were using at the time had an inefficient strncmp(). Since the processor had a length-limited instruction for comparing character buffers, I wrote a specialized variant of strncmp using assembler. "Before and after" benchmarks and profiling revealed we'd removed the primary bottleneck.
Several years later when folks went back to improve and modernize that system, the compiler and its library had changed (and the processors upgraded): there was no longer any real need for the (now obsolete) special version. New benchmarks also revealed that the bottlenecks had moved due to changing compilers, necessitating different optimizations.

Compare two matching strings algorithms - Practical approach

I wrote two differents algorithms that resolve some particular case of strings matching (implemented in C). I know that the theoretical O of this algorithms are equals but I think that in practical, one is better than the oder.
My question is, someone could recommend me some paper or some reading where shows how to compare algorithms with a practical approach?
I have several test set, I'm interested in measure execute time and memory size. I need take this values as independently as possible of the operating system and others program that could be runing concurrently.
Thanks!!!

you could compare your algorithms by generating the assembly code and compare them.
You could generate the assembly code with the gcc -S mycode.c command

I find that "looking at the code" is a good start. If one uses more variables and is more complicated than the other, it is probably slower.
However, there are of course clever tricks that can make a more complicated function actually run faster (for example code that reads 8 bytes at a time - but of course, once you find a difference, the code is more complex - for long strings that are largely similar, there is a big win tho').
So, in the end, there is no substitute for actually running the code, using clock-cycle timing (RDTSC instruction on x86 processors, for example), or running a large loop to execute the code many times to give a reasonable length runtime.
If your code isn't supposed to run on a single embedded target, you probably want to run the code on a set of different hardware to determine if the code that is faster on processor A is also faster on B, C and D type processors. Often this does work, but sometimes you can find that a particular processor model is faster for SOME operations, and another is faster for another (for example based on cache-size, etc).
It would also be very important, in the case of string operations, to try with different size inputs, different points of difference (e.g. a long string, but different "early", vs. long string with difference "late"). Sometimes, the different approaches will show different results for short/long strings or early/late point of difference (and of course "equal" strings that are long or short).

In order to complete all comments, I found a book called "A guide to experimental algorithmics" by Catherine C. Mcgeoch Amazon and a profesor recommend me a practical paper pdf.

How much efficiency would be lost if a hash table is implemented with a 2d array but the second dimension of the array is never accessed?

I need to make a hash table that can eventually be used to write a full assembler.
Basically I will have something like:
foo 100,
and I will need to hash foo and then store the 100 (the address of the command). I was thinking I should just use a 2d array. The second dimension of the array would only be accessed when recording the address (just an int) or when returning the address. There would be no searching done in the second dimension.
If I implement the hash table this way, would it be inefficient? If it is very inefficient, what would be a better way to implement the table?
Edit: I haven't written any code yet. In fact I don't even know what language I'm going to use yet. I want to write it in C so it will be more of a challenge, but I might write it in Java if I feel pressured for time.

If you have every other int in the array unused then in addition to memory waste you're going to use the cache poorly as the cache lines will be underused.
But normally I wouldn't worry about such things when writing an assembler as it's not something very performance demanding as say graphics or heavy computations. At least, I wouldn't rush into optimizing too early.
It is, however, important to keep in mind that once you start assembling large pieces of code (~100,000 lines of assembly) generated automatically (say, from C/C++ code by a compiler), performance will become more and more important as the user experience (wait times) degrades. At that point there will be many candidates for optimization: I/O, parsing, symbol look up, generation of as short as possible jump instructions if they can have multiple encodings for shorter and longer jumps. Expressions and macros will contribute too. You may even consider minimizing white space and comments in the input assembly code in the first place.

Without being able to see any code, there is no reason that this would have to be inefficient. The only reason that it could be is if you pre allocated a bunch of memory that you did not end up using, however without seeing your algorithm you had in mind it is impossible to tell.

strstr faster than algorithms?

I have a file that's 21056 bytes.
I've written a program in C that reads the entire file into a buffer, and then uses multiple search algorithms to search the file for a token that's 82 chars.
I've used all the implementations of the algorithms from the “Exact String Matching Algorithms” page. I've used: KMP, BM, TBM, and Horspool. And then I used strstr and benchmarked each one.
What I'm wondering is, each time the strstr outperforms all the other algorithms. The only one that is faster sometimes is BM.
Shouldn't strstr be the slowest?
Here's my benchmark code with an example of benchmarking BM:
double get_time()
{
LARGE_INTEGER t, f;
QueryPerformanceCounter(&t);
QueryPerformanceFrequency(&f);
return (double)t.QuadPart/(double)f.QuadPart;
}
before = get_time();
BM(token, strlen(token), buffer, len);
after = get_time();
printf("Time: %f\n\n", after - before);
Could someone explain to me why strstr is outperforming the other search algorithms? I'll post more code on request if needed.

Why do you think strstr should be slower than all the others? Do you know what algorithm strstr uses? I think it's quite likely that strstr uses a fine-tuned, processor-specific, assembly-coded algorithm of the KMP type or better. In which case you don't stand a chance of out-performing it in C for such small benchmarks.
(The reason I think this is likely is that programmers love to implement such things.)

Horspool, KMP et al are optimal at minimizing the number of byte-comparisons.
However, that's not the bottleneck on a modern processor. On an x86/64 processor, your string is being loaded into L1 cache in cache-line-width chunks (typically 64 bytes). No matter how clever your algorithm is, unless it gives you strides that are larger than that, you gain nothing; and the more complicated Horspool code is (at least one table lookup) can't compete.
Furthermore, you're stuck with the "C" string constraint of null-termination: SOMEWHERE the code has to examine every byte.
strstr() is expected to be optimal for a wide range of cases; e.g. searching for tiny strings like "\r\n" in a short string, as well as much longer ones where some smarter algorithm might have a hope. The basic strchr/memcmp loop is pretty hard to beat across the whole range of likely inputs.
Pretty much all x86-compatible processors since 2003 have supported SSE2. If you disassembled strlen()/x86 for glibc, you may have noticed that it uses some SSE2 PCMPEQ and MOVMASK operations to search for the null terminator 16 bytes at a time. The solution is so efficient that it beats the obvious super-simple loop, for anything longer than the empty string.
I took that idea and came up with a strstr() that beats glibc's strstr() for all cases greater than 1 byte --- where the relative difference is pretty much moot. If you're interested, check out:
Convergence SSE2 and strstr()
A better strstr() with no ASM code
If you care to see a non-SSE2 solution that dominates strstr() for target strings of more than 15 bytes, check out:
which makes use of multibyte comparisons rather than strchr(), to find a point at which to do a memcmp.
BTW you've probably figured out by now that the x86 REP SCASB/REP CMPSB ops fall on their ass for anything longer than 32 bytes, and are not much improvement for shorter strings. Wish Intel had devoted a little more attention to that, than to adding SSE4.2 "string" ops.
For strings large enough to matter, my perf tests show that BNDM is better than Horspool, across the board. BNDM is more tolerant of "pathological" cases, such as targets that heavily repeat the last byte of a pattern. BNDM can also make use of SSE2 (128bit registers) in a way that competes with 32bit registers in efficiency and start-up cost. Source code here.

Without seeing your code, it's hard to say exactly. strstr is heavily optimized, and usually written in assembly language. It does things like reading data 4 bytes at a time and comparing them (bit-twiddling if necessary if the alignment isn't right) to minimize memory latency. It can also take advantage of things like SSE to load 16 bytes at a time. If your code is only loading one byte at a time, it's probably getting killed by memory latency.
Use your debugger and step through the disassembly of strstr -- you'll probably find some interesting things in there.

Imagine you want to get something cleaned. You could just clean it yourself, or you could hire ten professional cleaners to clean it. If the cleaning job is an office building, the latter solution would be preferable. If the cleaning job was one window, the former would be preferable.
You never get any payback for the time spent setting up to do the job efficiently because the job doesn't take very long.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight