Related
I would like to know the difference between 2 codes in performance. What are the advantages and disadvantages?
Code 1:
temp = a;
a = b;
b = temp;
Code 2:
a = a + b;
b = a - b;
a = a - b;
The advantages of the first technique are that it is a universal idiom which is obvious and correct. It will work everywhere, on variables of any type. It is quite likely to be recognized by an optimizing compiler and replaced by an actual 'swap' instruction, if available. So besides being more clear and more correct, the first technique is likely to be more efficient, also.
The advantages of the second technique are that it avoids the use of a temporary variable, and that it is a deliciously obscure trick which is beloved by those who incessantly collect obscure tricks, and pose misguided "gotcha" interview questions involving obscure tricks, and (for all I know) who make their own programs less maintainable, less portable, and less reliable by cluttering them with obscure tricks.
The disadvantages of the first technique are: None.
(Theoretically, one might say there's a disadvantage in that it uses a temporary variable, but really, that's no disadvantage at all, because temporary variables are free. I don't think there's anyone on the planet who is still coding for a processor so limited in memory and registers that "saving" a temporary variable in this sort of way is something to actually worry about.)
The disadvantages of the second technique are that it is harder to write, harder for the reader to understand, and likely less efficient (perhaps significantly so). It "works" only on arithmetic types, not structures or other types. It won't work (it will quietly corrupt data) if it should happen be used in an attempt to swap data with itself. (More on this possibility later.) And if those aren't all bad enough, it is likely to be fundamentally buggy even under "ordinary" circumstances, since it could overflow, and with floating-point types it could alter one or both values slightly due to roundoff error, and with pointer types it's undefined if the pointers being swapped do not point within the same object.
You asked specifically about performance, so let's say a few more words about that. (Disclaimer: I am not an expert on microoptimization; I tend to think about instruction-level performance in rather abstract, handwavey terms.)
The first technique uses three assignments. The second technique uses an addition and two subtractions. On many machines an arithmetic operation takes the same number of cycles as a simple value assignment, so in many cases the performance of the two techniques will be identical. But it's hard to imagine how the second technique could ever be more efficient, while it's easy to imagine how the first technique could be more efficient. In particular, as I mentioned already, the first technique is easier for a compiler to recognize and turn into a more-efficient SWP instruction, if the target processor has one.
And now, some digressions. The second technique as presented here is a less-delicious variant of the traditional, deliciously obscure trick for swapping two variables without using a temporary. The traditional, deliciously obscure trick for swapping two variables without using a temporary is:
a ^= b;
b ^= a;
a ^= b;
Once upon a time it was fashionable in some circles to render these techniques in an even more deliciously obscure way:
a ^= b ^= a ^= b; /* WRONG */
a += b -= a -= b; /* WRONG */
But these renditions (while, yes, being absolutely exquisitely deliciously obscure if you like that sort of thing) have the additional crashing disadvantage that they represent undefined behavior, since they try to modify a multiple times in the same expression without an intervening sequence point. (See also the canonical SO question on that topic.)
In fairness, I have to mention that there is one actual circumstance under which the first technique's use of a temporary variable can be a significant disadvantage, and the second technique's lack of one can be therefore be an actual advantage. That one circumstance is if you are trying to write a generic 'swap' macro, along the lines of
#define Swap(a, b) (a = a + b, b = a - b, a = a - b)
The idea is that you can use this macro anywhere, and on variables of any type, and (since it's a macro, and therefore magic) you don't even have to use & on the arguments you call it with, as you would if it were a function. But in traditional C, at least, if you wanted to write a Swap macro like this, it was essentially impossible to do so using technique 1, because there was no way to declare the necessary temporary variable.
You weren't asking about this sub-problem, but since I brought it up, I have to say that the solution (although it is eternally frustrating to the lovers of delicious obscurity) is to just not attempt to write a "generic" macro to swap two values in the first place. You can't do it in C. (As a matter of fact, you could do it in C++, with the new definition of auto, and these days I guess C has some new way of writing generic macros, too.)
And there is actually an additional, crashing problem when you try to write a 'swap' macro this way, which is that it will not work — it will set one or both variables to 0 instead of swapping the values — if the caller ever tries to swap a value with itself. You might say that's not a problem, since maybe nobody would ever write Swap(x, x), but in a less-than-perfectly-optimal sorting routine they might very easily write Swap(a[i], a[j]) where sometimes i happened to be equal to j, or Swap(*p, *q) where sometimes the pointer p happened to be equal to q.
See also the C FAQ List, questions 3.3b, 10.3 and 20.15c.
Always use the first one. The second one can introduce subtle bugs. If the variables are of type int and a+b is greater than INT_MAX then the addition will yield undefined behavior.
When it comes to performance, the difference is likely barely measurable.
I'm not trying to replicate the usual question about C not being able to return arrays but to dig a bit more deeply into it.
We cannot do this:
char f(void)[8] {
char ret;
// ...fill...
return ret;
}
int main(int argc, char ** argv) {
char obj_a[10];
obj_a = f();
}
But we can do:
struct s { char arr[10]; };
struct s f(void) {
struct s ret;
// ...fill...
return ret;
}
int main(int argc, char ** argv) {
struct s obj_a;
obj_a = f();
}
So, I was skimming the ASM code generated by gcc -S and seems to be working with the stack, addressing -x(%rbp) as with any other C function return.
What is it with returning arrays directly? I mean, not in terms of optimization or computational complexity but in terms of the actual capability of doing so without the struct layer.
Extra data: I am using Linux and gcc on a x64 Intel.
First of all, yes, you can encapsulate an array in a structure, and then do anything you want with that structure (assign it, return it from a function, etc.).
Second of all, as you've discovered, the compiler has little difficulty emitting code to return (or assign) structures. So that's not the reason you can't return arrays, either.
The fundamental reason you cannot do this is that, bluntly stated, arrays are second-class data structures in C. All other data structures are first-class. What are the definitions of "first-class" and "second-class" in this sense? Simply that second-class types cannot be assigned.
(Your next question might be, "Other than arrays, are there any other second-class data types?", and I think the answer is "Not really, unless you count functions".)
Intimately tied up with the fact that you can't return (or assign) arrays is that there are no values of array type, either. There are objects (variables) of array type, but whenever you try to take the value of one, you immediately get a pointer to the array's first element. [Footnote: more formally, there are no rvalues of array type, although an object of array type can be thought of as an lvalue, albeit a non-assignable one.]
So, quite aside from the fact that you can't assign to an array, you can't even generate a value to try to assign. If you say
char a[10], b[10];
a = b;
it's as if you had written
a = &b[0];
So we've got an array on the left, but a pointer on the right, and we'd have a massive type mismatch even if arrays somehow were assignable. Similarly (from your example) if we try to write
a = f();
and somewhere inside the definition of function f() we have
char ret[10];
/* ... fill ... */
return ret;
it's as if that last line said
return &ret[0];
and, again, we have no array value to return and assign to a, merely a pointer.
(In the function call example, we've also got the very significant issue that ret is a local array, perilous to try to return in C. More on this point later.)
Now, part of your question is probably "Why is it this way?", and also "If you can't assign arrays, why can you assign structures containing arrays?"
What follows is my interpretation and my opinion, but it's consistent with what Dennis Ritchie describes in his paper The Development of the C Language.
The non-assignability of arrays arises from three facts:
C is intended to be syntactically and semantically close to the machine hardware. An elementary operation in C should compile down to one or a handful of machine instructions taking one or a handful of processor cycles.
Arrays have always been special, especially in the way they relate to pointers; this special relationship evolved from and was heavily influenced by the treatment of arrays in C's predecessor language B.
Structures weren't initially in C.
Due to point 2, it's impossible to assign arrays, and due to point 1, it shouldn't be possible anyway, because a single assignment operator = shouldn't expand to code that might take N thousand cycles to copy an N thousand element array.
And then we get to point 3, which really ends up leading to a contradiction.
When C got structures, they initially weren't fully first-class either, in that you couldn't assign or return them. But the reason you couldn't was simply that the first compiler wasn't smart enough, at first, to generate the code. There was no syntactic or semantic roadblock, as there was for arrays.
And the goal all along was for structures to be first-class, and this was achieved relatively early on. The compiler caught up, and learned how to emit code to assign and return structures, shortly around the time that the first edition of K&R was going to print.
But the question remains, if an elementary operation is supposed to compile down to a small number of instructions and cycles, why doesn't that argument disallow structure assignment? And the answer is, yes, it's a contradiction.
I believe (though this is more speculation on my part) that the thinking was something like this: "First-class types are good, second-class types are unfortunate. We're stuck with second-class status for arrays, but we can do better with structs. The no-expensive-code rule isn't really a rule, it's more of a guideline. Arrays will often be large, but structs will usually be small, tens or hundreds of bytes, so assigning them won't usually be too expensive."
So a consistent application of the no-expensive-code rule fell by the wayside. C has never been perfectly regular or consistent, anyway. (Nor, for that matter, are the vast majority of successful languages, human as well as artificial.)
With all of this said, it may be worth asking, "What if C did support assigning and returning arrays? How might that work?" And the answer will have to involve some way of turning off the default behavior of arrays in expressions, namely that they tend to turn into pointers to their first element.
Sometime back in the '90's, IIRC, there was a fairly well-thought-out proposal to do exactly this. I think it involved enclosing an array expression in [ ] or [[ ]] or something. Today I can't seem to find any mention of that proposal (though I'd be grateful if someone can provide a reference). At any rate, I believe we could extend C to allow array assignment by taking the following three steps:
Remove the prohibition of using an array on the left-hand side of an assignment operator.
Remove the prohibition of declaring array-valued functions. Going back to the original question, make char f(void)[8] { ... } legal.
(This is the biggie.) Have a way of mentioning an array in an expression and ending up with a true, assignable value (an rvalue) of array type. For the sake of argument I'll posit a new operator or pseudofunction called arrayval( ... ).
[Side note: Today we have a "key definition" of array/pointer correspondence, namely that:
A reference to an object of array type which appears in an expression decays (with three exceptions) into a pointer to its first element.
The three exceptions are when the array is the operand of a sizeof operator, or a & operator, or is a string literal initializer for a character array. Under the hypothetical modifications I'm discussing here, there would be a fourth exception, namely when the array was an operand of this new arrayval operator.]
Anyway, with these modifications in place, we could write things like
char a[8], b[8] = "Hello";
a = arrayval(b);
(Obviously we would also have to decide what to do if a and b were not the same size.)
Given the function prototype
char f(void)[8];
we could also do
a = f();
Let's look at f's hypothetical definition. We might have something like
char f(void)[8] {
char ret[8];
/* ... fill ... */
return arrayval(ret);
}
Note that (with the exception of the hypothetical new arrayval() operator) this is just about what Dario Rodriguez originally posted. Also note that — in the hypothetical world where array assignment was legal, and something like arrayval() existed — this would actually work! In particular, it would not suffer the problem of returning a soon-to-be-invalid pointer to the local array ret. It would return a copy of the array, so there would be no problem at all — it would be just about perfectly analogous to the obviously-legal
int g(void) {
int ret;
/* ... compute ... */
return ret;
}
Finally, returning to the side question of "Are there any other second-class types?", I think it's more than a coincidence that functions, like arrays, automatically have their address taken when they are not being used as themselves (that is, as functions or arrays), and that there are similarly no rvalues of function type. But this is mostly an idle musing, because I don't think I've ever heard functions referred to as "second-class" types in C. (Perhaps they have, and I've forgotten.)
Footnote: Because the compiler is willing to assign structures, and typically knows how to emit efficient code for doing so, it used to be a somewhat popular trick to co-opt the compiler's struct-copying machinery in order to copy arbitrary bytes from point a to point b. In particular, you could write this somewhat strange-looking macro:
#define MEMCPY(b, a, n) (*(struct foo { char x[n]; } *)(b) = \
*(struct foo *)(a))
that behaved more or less exactly like an optimized in-line version of memcpy(). (And in fact, this trick still compiles and works under modern compilers today.)
What is it with returning arrays directly? I mean, not in terms of optimization or computational complexity but in terms of the actual capability of doing so without the struct layer.
It has nothing to do with capability per se. Other languages do provide the ability to return arrays, and you already know that in C you can return a struct with an array member. On the other hand, yet other languages have the same limitation that C does, and even more so. Java, for instance, cannot return arrays, nor indeed objects of any type, from methods. It can return only primitives and references to objects.
No, it is simply a question of language design. As with most other things to do with arrays, the design points here revolve around C's provision that expressions of array type are automatically converted to pointers in almost all contexts. The value provided in a return statement is no exception, so C has no way of even expressing the return of an array itself. A different choice could have been made, but it simply wasn't.
For arrays to be first-class objects, you would expect at least to be able to assign them. But that requires knowledge of the size, and the C type system is not powerful enough to attach sizes to any types. C++ could do it, but doesn't due to legacy concerns—it has references to arrays of particular size (typedef char (&some_chars)[32]), but plain arrays are still implicitly converted to pointers as in C. C++ has std::array instead, which is basically the aforementioned array-within-struct plus some syntactic sugar.
Bounty hunting.
The authors of C did not aspire to be language or type system designers. They were tool designers. C was a tool to make system programming easier.
ref: B Kernighan on Pascal Ritchie on C
There was no compelling case for C to do anything unexpected; especially as UNIX and C were ushering in the era of least surprise. Copying arrays, and making complex syntax to do so when it was the metaphorical equivalent of having a setting to burn the toast did not fit the C model.
Everything in C, the language, is effectively constant time, constant size. C, the standard, seems preoccupied with doing away with this core feature which made C so popular; so expect the, uh, standard C/2023.feb07 to feature a punctuation nightmare that enables arrays as r-values.
The decision of the C authors makes eminent sense if you view the programming world pragmatically. If you view it as a pulpit for treasured beliefs, then get onboard for C/2023.feb07 before C/2023.feb08 nullifies it.
I'm afraid in my mind it's not so much a debate of first or second class objects, it's a religious discussion of good practice and applicable practice for deep embedded applications.
Returning a structure either means a root structure being changed by stealth in the depths of the call sequence, or a duplication of data and the passing of large chunks of duplicated data. The main applications of C are still largely concentrated around the deep embedded applications. In these domains you have small processors that don't need to be passing large blocks of data. You also have engineering practice that necessitates the need to be able to operate without dynamic RAM allocation, and with minimal stack and often no heap. It could be argued the return of the structure is the same as modification via pointer, but abstracted in syntax... I'm afraid I'd argue that's not in the C philosophy of "what you see is what you get" in the same way a pointer to a type is.
Personally, I would argue you have found a loop hole, whether standard approved or not. C is designed in such a way that allocation is explicit. You pass as a matter of good practice address bus sized objects, normally in an aspirational one cycle, referring to memory that has been allocated explicitly at a controlled time within the developers ken. This makes sense in terms of code efficiency, cycle efficiency, and offers the most control and clarity of purpose. I'm afraid, in code inspection I'd throw out a function returning a structure as bad practice. C does not enforce many rules, it's a language for professional engineers in many ways as it relies upon the user enforcing their own discipline. Just because you can, doesn't mean you should... It does offer some pretty bullet proof ways to handle data of very complex size and type utilising compile time rigour and minimising the dynamic variations of footprint and at runtime.
I have the choice to do either a pointer comparison or a strcmp.
I know that the string is never longer than 8 characters and I'm targeting a 64 bit platform.
will they perform equally well or will one of them be a better choice? I can imagine that this might differ between platforms and compilers, if so, I'd like to know the details about the platform/compiler specifics.
gr,
Coen
A pointer comparison will almost certainly be faster, as it is a single comparison of two pointers (possibly loading one or both into registers), whereas strcmp, even if inlined and the first bytes differ (best case) will require dereferencing both pointers. If strcmp isn't inlined then there's a function call and return, and if the first bytes don't differ (and aren't both NUL) then there are multiple dereferences.
For more insight into this, I suggest looking at the assembler output of your program using both methods.
Note: I'm assuming that your claim "I have the choice to do either a pointer comparison or a strcmp" is correct, which will only be the case if your strings are all known to have unique content.
The first question should be: Is this comparison the critical path in my executable? If not, the performance question might be irrelevant, because the impact may be so minor that it doesn't matter.
Comparing the pointers is only a subset of strcmp, because you don't know if the stringvalue is the same if the happen to be in different memory locations. You may have to consider that, in your design.
A pointer comparison is certainly faster. However, if you have a guaruanteed string length of 8 bytes, you may compare the strings without strcmp and use a datatype that has an 8 byte length and can be compared directly. This way you have basically a similar speed as a pointer comparison AND also compare the strings as well. But of course, this would only be reliable if you make sure that all strings are 8 bytes, and if they are shorter, you fill the remainder with zeroes.
Two strings (even short ones of 8 char) can be equal but at different addresses, so comparing pointers is not the same as using strcmp.
But your application might do hash-consing or string-interning, that is have a canonical string (e.g. like Glib quarks)
And you should not bother that much about performance, unless you measure it. Notice that some compilers (with high-enough optimization levels) are able to optimize quite well strcmp calls.
addenda
If your strings are not real arbitrary strings but 8 bytes, you might declare them with a union (which the compiler will suitably align and perhaps optimize).
typedef union {
char eightbytes[8];
int64_t sixtyfourbits;
} mytype_t;
then you might initialize
mytype_t foo = {.eightbytes="Foo"};
If you are sure that the strings are 0 byte padded (like the above initialization do; but if you heap allocate them, you need to zero them before filling e.g. with strncpy(p->eightbytes, somestring, 8) etc...), you could compare foo.sixtyfourbits == foo2.sixtyfourbits ...
But I find such code exceedingly bad taste. If you really want to code this way, add a lot of explanatory comments. I believe that coding this way makes your code unreadable and unmaintainable, for a probably very tiny performance benefit.
While there are various ways to find the length of an array in C, the language doesn't provide one.
What was the reason for not including such a common operation in C or any of its revisions?
One of the guiding philosophies of C's design is that all data types map directly to memory, and attempting to store metadata for array types such as length runs counter to that philosophy.
From an article by Dennis Ritchie describing the development of C, we find this:
Embryonic C
...
These semantics represented an easy transition from B, and I experimented with them for some months. Problems became evident when I tried to extend the type notation, especially to add structured (record) types. Structures, it seemed, should map in an intuitive way onto memory in the machine, but in a structure containing an array, there was no good place to stash the pointer containing the base of the array, nor any convenient way to arrange that it be initialized. For example, the directory entries of early Unix systems might be described in C as
struct {
int inumber;
char name[14];
};
I wanted the structure not merely to characterize an abstract object but also to describe a collection of bits that might be read from a directory. Where could the compiler hide the pointer to name that the semantics demanded? Even if structures were thought of more abstractly, and the space for pointers could be hidden somehow, how could I handle the technical problem of properly initializing these pointers when allocating a complicated object, perhaps one that specified structures containing arrays containing structures to arbitrary depth?
The solution constituted the crucial jump in the evolutionary chain between typeless BCPL and typed C. It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today's C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.
Emphasis mine. Just replace the term "pointer" with "metadata" in the passage above, and I think we have the answer to your question.
Unless someone here happens to be on the C standard committee, you're unlikely to get an authoritative answer. But two reasons I can think of:
In many (most?) situations, you don't have an array, you just have a pointer.
Storing metadata about the array increases the storage size, etc. The general rule of C is that you don't pay for what you don't use.
C is not object-oriented, so it has no concept of methods that are attached to objects. It was designed with speed and simplicity in mind, and the common idiom sizeof (array) / sizeof(array[0]) is short and straightforward.
It is down to efficiency. C is a very efficient programing language.
Array syntax in C is just a syntactic sugar for pointer arithmetic. If you want to have a real array with length and bounds checking, you can create a struct which contains an array pointer and its length and access it only through functions which check bounds.
As most C programmers know, you can't directly compare two structures.
Consider:
void isequal(MY_STRUCT a, MY_STRUCT b)
{
if (a == b)
{
puts("equal");
}
else
{
puts("not equal");
}
}
The a==b comparison will AFAIK throw a compile error on any sensible C compiler, because the C standard doesn't allow for built-in structure comparison. Workarounds using memcmp are of course a bad idea due to alignment, packing, bitfields etc., so we end up writing element by element comparison functions.
On the other hand it DOES allow for structure assignment e.g. a = b is entirely legal.
Clearly the compiler can cope with that fairly trivially, so why not comparison?
The only idea I had was that structure assignment is probably fairly close to memcpy(), as the gaps due to alignment etc. don't matter. On the other hand, a comparison might be more complicated. Or is this something I'm missing?
Obviously, I'm aware that doing a simple element by element comparison isn't necessarily enough, e.g. if the structure contains a pointer to a string, but there are circumstances where it would be useful.
As others have mentioned, here's an extract from C: A Reference Manual by Harbison and Steele:
Structures and unions cannot be compared for equality, even though assignment for these types is allowed. The gaps in structures and unions caused by alignment restrictions could contain arbitrary values, and compensating for this would impose an unacceptable overhead on the equality comparison or on all operations that modified structure and union types.
Comparison is unsupported for the same reason memcmp fails.
Due to padding fields the comparison would fail in unpredictable ways which would be unacceptable for most programmers. Assignment changes the invisible padding fields, but these are invisible anyway, so nothing unexpected there.
Obviously, you may ask: so why doesn't it just zero-fill all the padding fields ? Sure that would work but it would also make all programs pay for something they might not need.
EDIT
Oli Charlesworth notes in the comments that you may be asking: "why doesn't the compiler generate code for member-by-member comparison". If that is the case, I must confess: I don't know :-). The compiler would have all the needed information if it would only allow comparing complete types.
I found this in the C rationale (C99 rationale V5.10), 6.5.9:
The C89 Committee considered, on more than one occasion, permitting comparison of structures
for equality. Such proposals foundered on the problem of holes in structures. A byte-wise
comparison of two structures would require that the holes assuredly be set to zero so that all
holes would compare equal, a difficult task for automatic or dynamically allocated variables.
The possibility of union-type elements in a structure raises insuperable problems with this
approach. Without the assurance that all holes were set to zero, the implementation would have
to be prepared to break a structure comparison into an arbitrary number of member comparisons;
a seemingly simple expression could thus expand into a substantial stretch of code, which is
contrary to the spirit of C
In plain English: Since structs/unions may contain padding bytes, and the committee had not enforced these to hold certain values, they wouldn't implement this feature. Because if all padding bytes must be set to zero, it would require additional run-time overhead.
Auto-generate comparison operator is bad idea. Imagine how comparison would work for this structure:
struct s1 {
int len;
char str[100];
};
This is pascal like string with maximum length 100
Another case
struct s2 {
char a[100];
}
How can the compiler know how to compare a field? If this is a NUL-terminated string, the compiler must use strcmp or strncmp. If this is char array compiler must use memcmp.
To add to the existing good answers:
struct foo {
union {
uint32_t i;
float f;
} u;
} a, b;
a.u.f = -0.0;
b.u.f = 0.0;
if (a==b) // what to do?!
The problem arises inherently from unions not being able to store/track which member is current.