I have the choice to do either a pointer comparison or a strcmp.
I know that the string is never longer than 8 characters and I'm targeting a 64 bit platform.
will they perform equally well or will one of them be a better choice? I can imagine that this might differ between platforms and compilers, if so, I'd like to know the details about the platform/compiler specifics.
gr,
Coen
A pointer comparison will almost certainly be faster, as it is a single comparison of two pointers (possibly loading one or both into registers), whereas strcmp, even if inlined and the first bytes differ (best case) will require dereferencing both pointers. If strcmp isn't inlined then there's a function call and return, and if the first bytes don't differ (and aren't both NUL) then there are multiple dereferences.
For more insight into this, I suggest looking at the assembler output of your program using both methods.
Note: I'm assuming that your claim "I have the choice to do either a pointer comparison or a strcmp" is correct, which will only be the case if your strings are all known to have unique content.
The first question should be: Is this comparison the critical path in my executable? If not, the performance question might be irrelevant, because the impact may be so minor that it doesn't matter.
Comparing the pointers is only a subset of strcmp, because you don't know if the stringvalue is the same if the happen to be in different memory locations. You may have to consider that, in your design.
A pointer comparison is certainly faster. However, if you have a guaruanteed string length of 8 bytes, you may compare the strings without strcmp and use a datatype that has an 8 byte length and can be compared directly. This way you have basically a similar speed as a pointer comparison AND also compare the strings as well. But of course, this would only be reliable if you make sure that all strings are 8 bytes, and if they are shorter, you fill the remainder with zeroes.
Two strings (even short ones of 8 char) can be equal but at different addresses, so comparing pointers is not the same as using strcmp.
But your application might do hash-consing or string-interning, that is have a canonical string (e.g. like Glib quarks)
And you should not bother that much about performance, unless you measure it. Notice that some compilers (with high-enough optimization levels) are able to optimize quite well strcmp calls.
addenda
If your strings are not real arbitrary strings but 8 bytes, you might declare them with a union (which the compiler will suitably align and perhaps optimize).
typedef union {
char eightbytes[8];
int64_t sixtyfourbits;
} mytype_t;
then you might initialize
mytype_t foo = {.eightbytes="Foo"};
If you are sure that the strings are 0 byte padded (like the above initialization do; but if you heap allocate them, you need to zero them before filling e.g. with strncpy(p->eightbytes, somestring, 8) etc...), you could compare foo.sixtyfourbits == foo2.sixtyfourbits ...
But I find such code exceedingly bad taste. If you really want to code this way, add a lot of explanatory comments. I believe that coding this way makes your code unreadable and unmaintainable, for a probably very tiny performance benefit.
Related
Is strcmp slower than strncmp as one can give pre-calculated string length to it, but strcmp does not receive such information ?
I am writing an interpreter. I am aware that these functions are both optimized. I wonder what will be the better approach (in terms of performance), as I will do scan anyway and I will know offset positions hence lengths.
They do different things, so comparing them directly does not make sense. strncmp compares the first n (or fewer, if the string ends sooner) characters of a string. strcmp compares whole strings. If n is sufficiently large that strncmp will compare the whole strings (so that the behavior becomes effectively the same as strcmp) then strncmp is likely to be moderately slower because it also has to keep track of a counter, but the difference might or might not be measurable or even present in a given implementation. For example an implementation of strcmp could just pass SIZE_MAX as the value for n to strncmp.
There is only one way to know: benchmark it. Speculation is of no use.
Be sure to do that with a sufficiently large number of strings and in representative conditions (statistical distribution of string lengths and statistical distribution of matching prefix lengths).
My bet is that there will be no significant difference.
You state that performance is a problem, so let's concentrate on that.
Implementations of library functions vary from compiler vendor to compiler vendor, and also across versions of the same compiler or development environment. Thus, Yves Daoust is correct when he says "there is only one way to know: benchmark it."
I would go further and suggest that if you haven't profiled your code, you start by doing that. The bottlenecks are all too often in surprising places you'd not expect.
It may do some good, however, to compare the implementations of strcmp() and strncmp() if you have the source code.
I once found myself in very nearly the same situation you are in. (Writing a front end information display that used multiple character based terminal backends to do its job. It required repeated near-real-time parsing of several text buffers.) The Borland compiler we were using at the time had an inefficient strncmp(). Since the processor had a length-limited instruction for comparing character buffers, I wrote a specialized variant of strncmp using assembler. "Before and after" benchmarks and profiling revealed we'd removed the primary bottleneck.
Several years later when folks went back to improve and modernize that system, the compiler and its library had changed (and the processors upgraded): there was no longer any real need for the (now obsolete) special version. New benchmarks also revealed that the bottlenecks had moved due to changing compilers, necessitating different optimizations.
In the program I'm working on, this particular operation is definitely not going to be the bottleneck, but it did get me thinking. From the answers to questions such as this one and this one I've learned two ways to easily (efficiently) set all the elements of an array to zero in C:
double myArray[3];
static const double zeroes[3] = {0};
memcpy(myArray, zeroes, sizeof(zeroes));
and
double myArray[3];
memset(myArray, 0, numberOfElementsInMyArray * sizeof(myArray[0]));
Before I move onto my real question: I'm not entirely sure but based on the information I've read, I assume this method would, at least in principle, fill the array with int zeroes (well, unsigned char's but these seem to be fairly equivalent). Is that correct? If so, is an explicit conversion of the int zeroes to double zeroes necessary or is it done implicitly if myArray is declared as an array of double's?
Anyway, my real question is this: if the array isn't very big at all (like the myArray I've declared above), is either of these methods still preferred over a little loop? What if you have a few arrays of the same small size that all need to be assigned zeroes? If commented properly, do you think readability is a factor in the decision and favours a particular solution?
Just to be entirely clear: I am not looking to initialize an array to zeroes.
If it's just a small array (like three elements), it probably won't make much difference whether you use mem* functions, or a loop, or three distinct assignments. In fact, that latter case may even be faster as you're not suffering the cost of a function call:
myArry[0] = myArray[1] = myArray[2] = 0;
But, even if one is faster, the difference would probably not be worth worrying about. I tend to optimise for readability first then, if needed, optimise for space/storage later.
If it was a choice between memcpy and memset, I'd choose the latter (assuming, as seems to be the case, that the all-zero bit pattern actually represented 0.0 in your implementation) for two reasons:
it doesn't require storage of a zeroed array; and
the former will get you into trouble if you change the size of one array and forget the other.
And, for what it's worth, your memset solution doesn't need to have the multiplication. Since you can get the size of the entire array, you can just do:
memset (myArray, 0, sizeof (myArray));
i think the first method of setting without using a loop is better for performance
what happen is that the merroy of array is bitwised by 0 (& 0) so it faster than using a loop for each element in the array.
I'm going through O'Reilly's Practical C Programming book, and having read the K&R book on the C programming language, and I am really having trouble grasping the concept behind unions.
They take the size of the largest data type that makes them up...and the most recently assigned one overwrites the rest...but why not just use / free memory as needed?
The book mentions that it's used in communication, where you need to set flags of the same size; and on a googled website, that it can eliminate odd-sized memory chunks...but is it of any use in a modern, non-embedded memory space?
Is there something crafty you can do with it and CPU registers? Is it simply a hold over from an earlier era of programming? Or does it, like the infamous goto, still have some powerful use (possibly in tight memory spaces) that makes it worth keeping around?
Well, you almost answered your question: Memory.
Back in the days memory was rather low, and even saving a few kbytes has been useful.
But even today there are scenarios where unions would be useful. For example, if you'd like to implement some kind of variant datatype. The best way to do this is using a union.
This doesn't sound like much, but let's just assume you want to use a variable either storing a 4 character string (like an ID) or a 4 byte number (which could be some hash or indeed just a number).
If you use a classic struct, this would be 8 bytes long (at least, if you're unlucky there are filling bytes as well). Using an union it's only 4 bytes. So you're saving 50% memory, which isn't a lot for one instance, but imagine having a million of these.
While you can achieve similar things by casting or subclassing a union is still the easiest way to do this.
One use of unions is having two variables occupy the same space, and a second variable in the struct decide what data type you want to read it as.
e.g. you could have a boolean 'isDouble', and a union 'doubleOrLong' which has both a double and a long. If isDouble == true interpret the union as a double else interpret it as a long.
Another use of unions is accessing data types in different representations. For instance, if you know how a double is laid out in memory, you could put a double in a union, access it as a different data type like a long, directly access its bits, its mantissa, its sign, its exponent, whatever, and do some direct manipulation with it.
You don't really need this nowadays since memory is so cheap, but in embedded systems it has its uses.
The Windows API makes use of unions quite a lot. LARGE_INTEGER is an example of such a usage. Basically, if the compiler supports 64-bit integers, use the QuadPart member; otherwise, set the low DWORD and the high DWORD manually.
It's not really a hold over, as the C language was created in 1972, when memory was a real concern.
You could make the argument that in modern, non-embedded space, you might not want to use C as a programming language to begin with. If you've chosen C as your language choice for implementation, you're looking to harness the benefits of C: it's efficient, close-to-metal, which results in tight, fast binaries.
As such, when choosing to use C, you'd still want to take advantage of it's benefits, which includes memory-space efficiency. To which, the Union works very well; allowing you to have some degree of type safety, while enforcing the smallest memory foot print available.
One place where I have seen it used is in the Doom 3/idTech 4 Fast Inverse Square Root implementation.
For those unfamiliar with this algorithm, it essentially requires treating a floating point number as an integer. The old Quake (and earlier) version of the code does this by the following:
float y = 2.0f;
// treat the bits of y as an integer
long i = * ( long * ) &y;
// do some stuff with i
// treat the bits of i as a float
y = * ( float * ) &i;
original source on GitHub
This code takes the address of a floating point number y, casts it to a pointer to a long (ie, a 32 bit integer in Quake days), and derefences it into i. Then it does some incredibly bizarre bit-twiddling stuff, and the reverse.
There are two disadvantages of doing it this way. One is that the convoluted address-of, cast, dereference process forces the value of y to be read from memory, rather than from a register1, and ditto on the way back. On Quake-era computers, however, floating point and integer registers were completely separate so you pretty much had to push to memory and back to deal with this restriction.
The second is that, at least in C++, doing such casting is deeply frowned upon, even when doing what amounts to voodoo such as this function does. I'm sure there are more compelling arguments, however I'm not sure what they are :)
So, in Doom 3, id included the following bit in their new implementation (which uses a different set of bit twiddling, but a similar idea):
union _flint {
dword i;
float f;
};
...
union _flint seed;
seed.i = /* look up some tables to get this */;
double r = seed.f; // <- access the bits of seed.i as a floating point number
original source on GitHub
Theoretically, on an SSE2 machine, this can be accessed through a single register; I'm not sure in practice whether any compiler would do this. It's still somewhat cleaner code in my opinion than the casting games in the earlier Quake version.
1 - ignoring "sufficiently advanced compiler" arguments
I have defined a union as follows:
union {
uintptr_t refcount;
struct slab_header *page;
} u;
The page pointer is guaranteed to be aligned on a page boundary (most probably 4096), and is never going to be NULL. This implies that the lowest possible address is going to be 4096.
refcount will be within 0 .. 4095.
Upon the creation of the enclosing struct, I can either have u.refcount = 0 or u.page = mmap(...).
The code around this union is going to be something like that:
if (u.refcount < 4096) {
/* work with refcount, possibly increment it */
} else {
/* work with page, possibly dereference it */
}
Is this always guaranteed to work on a fully POSIX-compliant implementation? Is it ever possible that uintptr_t and struct slab_header * have different representations, so that, for example, when u.page == 8192, u.refcount < 4096 yields true?
I don't think that it's "always guaranteed to work", because:
uintptr_t is optional (7.18.1.4).
A void * can be converted to uintptr_t and back (7.18.1.4). It's not guaranteed that it is the case with struct slab_header*. A void * has the same representation and alignment requirements as a pointer to a character type. Pointers to structures needn't have the same representation or alignment (6.2.5 27). Even if this was not the case, nothing guarantees sizeof(uintptr_t) == sizeof(void *), it could obviously be larger and still satisfy the requirement of being convertible to void * in the typical case of homogeneous pointers.
Finally, even if they have the same size and are convertible, it's possible the representation of the pointer values differs in a strange way from that of unsigned integers. The representation of unsigned integers is relatively constrained (6.2.6.2 1), but no such constraints exist on pointers.
Therefore, I'd conclude the best way would be to have a common initial elements that tells the state.
I'm going to answer a different question -- "is this a good idea?" The more significant worry I see from your code is aliasing issues. I would be unsurprised (in fact, I would mildly expect it) if it were possible to write a piece of code that had the effect of
Write something to u.refcount
Write something to u.page
Read u.refcount
Discover the value you read is the same as what you first wrote
You may scoff, but I've seen this happen -- and if you don't appreciate the danger, it will take a very long time to debug.
You may be safe with this particular union; I'll leave it to someone with more experience with this sort of thing (and/or a copy of the C standard handy) to make that judgment. But I want to emphasize that this is something important to worry about!!!
There is another cost. Using the union in combination with a "magic" test to discover what is stored in it -- especially one using system-specific facts -- is going to make your code harder to read, debug, and maintain. You can take steps to mitigate that fact, but it will always be an issue.
And, of course, the cost of your code simply not working when someone tries to use it on a weird machine.
The right solution is probably to structure your code in a way so that the only code that cares about the data layout is a couple tiny inlined access routines, so that you can easily swap how to store things. Organize your code to use a compile-time flag to choose which one!
Now that I've said all that, the question I really want to ask (and want to get you in the habit of considering): "is it worth it?" You're making sacrifices in your time, code readability, ease of programming, and portability. What are you gaining?
A lot of people forget to ask this question, and they write complex buggy code when simplistic, easy to write and read code is just as good.
If you've discovered this change is going to double the performance of a time-intensive routine, then it's probably worth dealing with this hack. But if you're investigating this simply because it's a clever trick to save 8 bytes, then you should consider it only as an intellectual exercise.
I think that <stdint.h> and C99 standard guarantee that uintptr_t and void* has the same size and can be casted without loss.
Being page aligned for a pointer is an implementation detail.
there should be any problem with that code. But for any case you can have in the init of your program a check:
if (sizeof(uintptr_t) != sizeof (struct slab_header*)
{
print error
}
but something seems off (or not clear). you want the "refcount" to be 0 when ever you have a page. and not zero when the page is NULL ?
Is this always guaranteed to work?
No. You can't read a member of an union different than the last one written. And optimizers do take that into account, so that isn't a theoretical problem.
Is it ever possible that uintptr_t and struct slab_header * have different representations,
so that, for example, when u.page == 8192, u.refcount < 4096 yields true?
It is also theoretically possible. I can't think of a current implementation where it occurs.
Even though I am a long time C programmer, I only recently learned that one can directly assign structure variables to one another instead of using memcpy:
struct MyStruct a,b;
...
a = b; /* implicit memcpy */
Though this feels a bit "high-level" for C, it is definitely useful. But why can't I do equality and inequality comparison:
if (a == b) ...
if (a != b) ...
Is there any good reason for the standard to exclude this? Or is this an inconsistency in the - otherwise very elegant - standard?
I don't see why I can replace my memcpy's for clean assignments, but I have to keep those ugly memcmp's in place.
Per the comp.lang.c FAQ:
There is no good way for a compiler
to implement structure comparison
(i.e. to support the == operator for
structures) which is consistent with
C's low-level flavor. A simple
byte-by-byte comparison could founder
on random bits present in unused
"holes" in the structure (such
padding is used to keep the alignment
of later fields correct). A field-by-field comparison might require unacceptable amounts of
repetitive code for large structures.
Any compiler-generated comparison
could not be expected to compare
pointer fields appropriately in all
cases: for example, it's often
appropriate to compare char * fields
with strcmp rather than ==.
If you need to compare two structures,
you'll have to write your own function
to do so, field by field.