array notation vs pointer notation C - c

Is there any advantage to using pointer notation over array notation? I realize that there may be some special cases where pointer notation is better, but it seems to me that array notation is clearer. My professor told us that he prefers pointer notation "because it's C", but it's not something he will be marking. And I know that there are differences with declaring strings as character arrays vs declaring a pointer as a string - I'm just talking about in general looping through an array.

If you write a straightforward loop, both array and pointer forms typically compile to the same machine code.
There are differences in especially non-constant loop exit conditions, but it only matters if you are trying to optimize the loop for a specific compiler and architecture.
So, how about we consider a real world example that relies on both?
These types implement a double-precision floating-point matrix of dynamically determined size, with separate reference-counted data storage:
struct owner {
long refcount;
size_t size;
double data[]; /* C99 flexible array member */
};
struct matrix {
long rows;
long cols;
long rowstep;
long colstep;
double *origin;
struct owner *owner;
};
The idea is that when you need a matrix, you describe it using a local variable of type struct matrix. All data referred to is stored in dynamically allocated struct owner structures, in the C99 flexible array member. After you no longer need the matrix, you must explicitly "drop" it. This allows multiple matrices refer to the same data: you can even have separate row, column, or diagonal vectors, with any change to one immediately reflected in the all others (because they refer to the same data values).
When a matrix is associated with data, either by creating an empty matrix, or by referring to existing data referred to by another matrix, the owner structure refcount is incremented. Whenever a matrix is dropped, the referred to owner structure refcount is decremented. The owner structure is freed, when the refcount drops to zero. This means you only need to remember to "drop" each matrix you used, and the data referred to will be correctly managed and released as soon as possible (unneeded), but never too early.
This all assumes a single-threaded process; multithreaded handling is quite a bit more complicated.
To access element in matrix struct matrix m, row r, column c, assuming 0 <= r < m.rows and 0 <= c < m.cols, you use m.origin[r*m.rowstep + c*m.colstep].
If you want to transpose a matrix, you simply swap m.rows and m.cols, and m.rowstep and m.colstep. All that changes, is the order in which the data (stored in the owner structure) is read.
(Note that origin points to the double which appears at row 0, column 0, in the matrix; and that rowstep and colstep can be negative. This allows all kinds of weird "views" to the otherwise dull regular data, like mirrors and diagonals and so on.)
If we did not have the C99 flexible array member -- say, we only had pointers, and no array notation at all --, the owner structure data member would have to be a pointer. It would mean an additional redirection at the hardware level (slowing down the data accesses a bit). We would either need to allocate the memory pointed by data separately, or use tricks to point to an address following the owner structure itself, but suitably aligned for a double.
Multidimensional arrays do have their uses -- basically, when the sizes of all dimensions (or all but one dimension) are known --, and it's nice for the compiler to take care of the indexing, but it does not have to mean they are always easier than methods using pointers. For example, in the above matrix structure case, we can always define some helper preprocessor macros, like
#define MATRIXELEM(m, r, c) ((m).origin[(r)*(m).rowstep + (c)*(m).colstep])
which admittedly has the downside that it evaluates the first parameter, m, three times. (It means that MATRIXELEM(m++,0,0) would actually try to increment m three times.) In this particular case, m is normally a local variable of struct matrix type, which should minimize surprises. One could have e.g.
struct matrix m1, m2;
/* Stuff that initializes m1 and m2, and makes sure they point
to valid matrix data */
MATRIXELEM(m1, 0, 0) = MATRIXELEM(m2, 0, 0);
The "extra" parentheses in such macros ensure that if you use a calculation, for example i + 4*j as row, the index calculation is correct ((i + 4*j)*m.rowstep and not i + 4*j*m.rowstep). In preprocessor macros, those parentheses are not really "extra" at all. In addition to ensure the correct calculation, having the "extra" parentheses also tell other programmers that the macro writer has been careful in avoiding such arithmetic-related bugs. (I for one consider it "good form" to put the parentheses there, even in cases where they are not needed for syntax unambiquity, if it conveys that "assurance" to other developers reading the code.)
And this, after all this text, leads to my most important point: Some things are easier expressed and understood by us human programmers using array notation than pointer notation, and vice versa. "Foo"[1] is pretty obviously equal to 'o', whereas *("Foo"+1) is not nearly as obvious. (Then again, neither is 1["foo"], but you can blame the C standardization folks for that.)
Based on the examples above, I consider the two notations complementary; they do have large overlap especially in simple loops -- in which case it is okay to just pick one --, but being able to utilize both notations and pick one not based on ones proficiency in one but based on ones opinion on as to what makes most sense wrt. readability and maintainability, is an important skill for any C programmer, in my not very humble opinion.

Actually if you, say, pass an array argument to a function in C, you actually pass a pointer to its beginning. This doesn't really passes an array in a common sense, first, because passing an array would include passing its actual length, second, because passing an array (as a value) would imply its copying.
In other word, you really pass an iterator pointing to an array beginning (like std::vector::begin() in C++) but you pretend that you pass the array itself. It's very confusing in fact. So, using pointers represents things those are really happening in a much more clear way, and it definitely should be preferred.
There may be some advantages of array notation too but I don't think they overweight the drawbacks mentioned. First, using array notation emphasizes the difference between pointer to a single value and pointer to continuous block. And then, you may specify an expected size of passed array for your own reference. But that size isn't actually passed to expressions or functions or somehow checked which fact is very confusing.

Related

What does impossibility to return arrays actually mean in C?

I'm not trying to replicate the usual question about C not being able to return arrays but to dig a bit more deeply into it.
We cannot do this:
char f(void)[8] {
char ret;
// ...fill...
return ret;
}
int main(int argc, char ** argv) {
char obj_a[10];
obj_a = f();
}
But we can do:
struct s { char arr[10]; };
struct s f(void) {
struct s ret;
// ...fill...
return ret;
}
int main(int argc, char ** argv) {
struct s obj_a;
obj_a = f();
}
So, I was skimming the ASM code generated by gcc -S and seems to be working with the stack, addressing -x(%rbp) as with any other C function return.
What is it with returning arrays directly? I mean, not in terms of optimization or computational complexity but in terms of the actual capability of doing so without the struct layer.
Extra data: I am using Linux and gcc on a x64 Intel.
First of all, yes, you can encapsulate an array in a structure, and then do anything you want with that structure (assign it, return it from a function, etc.).
Second of all, as you've discovered, the compiler has little difficulty emitting code to return (or assign) structures. So that's not the reason you can't return arrays, either.
The fundamental reason you cannot do this is that, bluntly stated, arrays are second-class data structures in C. All other data structures are first-class. What are the definitions of "first-class" and "second-class" in this sense? Simply that second-class types cannot be assigned.
(Your next question might be, "Other than arrays, are there any other second-class data types?", and I think the answer is "Not really, unless you count functions".)
Intimately tied up with the fact that you can't return (or assign) arrays is that there are no values of array type, either. There are objects (variables) of array type, but whenever you try to take the value of one, you immediately get a pointer to the array's first element. [Footnote: more formally, there are no rvalues of array type, although an object of array type can be thought of as an lvalue, albeit a non-assignable one.]
So, quite aside from the fact that you can't assign to an array, you can't even generate a value to try to assign. If you say
char a[10], b[10];
a = b;
it's as if you had written
a = &b[0];
So we've got an array on the left, but a pointer on the right, and we'd have a massive type mismatch even if arrays somehow were assignable. Similarly (from your example) if we try to write
a = f();
and somewhere inside the definition of function f() we have
char ret[10];
/* ... fill ... */
return ret;
it's as if that last line said
return &ret[0];
and, again, we have no array value to return and assign to a, merely a pointer.
(In the function call example, we've also got the very significant issue that ret is a local array, perilous to try to return in C. More on this point later.)
Now, part of your question is probably "Why is it this way?", and also "If you can't assign arrays, why can you assign structures containing arrays?"
What follows is my interpretation and my opinion, but it's consistent with what Dennis Ritchie describes in his paper The Development of the C Language.
The non-assignability of arrays arises from three facts:
C is intended to be syntactically and semantically close to the machine hardware. An elementary operation in C should compile down to one or a handful of machine instructions taking one or a handful of processor cycles.
Arrays have always been special, especially in the way they relate to pointers; this special relationship evolved from and was heavily influenced by the treatment of arrays in C's predecessor language B.
Structures weren't initially in C.
Due to point 2, it's impossible to assign arrays, and due to point 1, it shouldn't be possible anyway, because a single assignment operator = shouldn't expand to code that might take N thousand cycles to copy an N thousand element array.
And then we get to point 3, which really ends up leading to a contradiction.
When C got structures, they initially weren't fully first-class either, in that you couldn't assign or return them. But the reason you couldn't was simply that the first compiler wasn't smart enough, at first, to generate the code. There was no syntactic or semantic roadblock, as there was for arrays.
And the goal all along was for structures to be first-class, and this was achieved relatively early on. The compiler caught up, and learned how to emit code to assign and return structures, shortly around the time that the first edition of K&R was going to print.
But the question remains, if an elementary operation is supposed to compile down to a small number of instructions and cycles, why doesn't that argument disallow structure assignment? And the answer is, yes, it's a contradiction.
I believe (though this is more speculation on my part) that the thinking was something like this: "First-class types are good, second-class types are unfortunate. We're stuck with second-class status for arrays, but we can do better with structs. The no-expensive-code rule isn't really a rule, it's more of a guideline. Arrays will often be large, but structs will usually be small, tens or hundreds of bytes, so assigning them won't usually be too expensive."
So a consistent application of the no-expensive-code rule fell by the wayside. C has never been perfectly regular or consistent, anyway. (Nor, for that matter, are the vast majority of successful languages, human as well as artificial.)
With all of this said, it may be worth asking, "What if C did support assigning and returning arrays? How might that work?" And the answer will have to involve some way of turning off the default behavior of arrays in expressions, namely that they tend to turn into pointers to their first element.
Sometime back in the '90's, IIRC, there was a fairly well-thought-out proposal to do exactly this. I think it involved enclosing an array expression in [ ] or [[ ]] or something. Today I can't seem to find any mention of that proposal (though I'd be grateful if someone can provide a reference). At any rate, I believe we could extend C to allow array assignment by taking the following three steps:
Remove the prohibition of using an array on the left-hand side of an assignment operator.
Remove the prohibition of declaring array-valued functions. Going back to the original question, make char f(void)[8] { ... } legal.
(This is the biggie.) Have a way of mentioning an array in an expression and ending up with a true, assignable value (an rvalue) of array type. For the sake of argument I'll posit a new operator or pseudofunction called arrayval( ... ).
[Side note: Today we have a "key definition" of array/pointer correspondence, namely that:
A reference to an object of array type which appears in an expression decays (with three exceptions) into a pointer to its first element.
The three exceptions are when the array is the operand of a sizeof operator, or a & operator, or is a string literal initializer for a character array. Under the hypothetical modifications I'm discussing here, there would be a fourth exception, namely when the array was an operand of this new arrayval operator.]
Anyway, with these modifications in place, we could write things like
char a[8], b[8] = "Hello";
a = arrayval(b);
(Obviously we would also have to decide what to do if a and b were not the same size.)
Given the function prototype
char f(void)[8];
we could also do
a = f();
Let's look at f's hypothetical definition. We might have something like
char f(void)[8] {
char ret[8];
/* ... fill ... */
return arrayval(ret);
}
Note that (with the exception of the hypothetical new arrayval() operator) this is just about what Dario Rodriguez originally posted. Also note that — in the hypothetical world where array assignment was legal, and something like arrayval() existed — this would actually work! In particular, it would not suffer the problem of returning a soon-to-be-invalid pointer to the local array ret. It would return a copy of the array, so there would be no problem at all — it would be just about perfectly analogous to the obviously-legal
int g(void) {
int ret;
/* ... compute ... */
return ret;
}
Finally, returning to the side question of "Are there any other second-class types?", I think it's more than a coincidence that functions, like arrays, automatically have their address taken when they are not being used as themselves (that is, as functions or arrays), and that there are similarly no rvalues of function type. But this is mostly an idle musing, because I don't think I've ever heard functions referred to as "second-class" types in C. (Perhaps they have, and I've forgotten.)
Footnote: Because the compiler is willing to assign structures, and typically knows how to emit efficient code for doing so, it used to be a somewhat popular trick to co-opt the compiler's struct-copying machinery in order to copy arbitrary bytes from point a to point b. In particular, you could write this somewhat strange-looking macro:
#define MEMCPY(b, a, n) (*(struct foo { char x[n]; } *)(b) = \
*(struct foo *)(a))
that behaved more or less exactly like an optimized in-line version of memcpy(). (And in fact, this trick still compiles and works under modern compilers today.)
What is it with returning arrays directly? I mean, not in terms of optimization or computational complexity but in terms of the actual capability of doing so without the struct layer.
It has nothing to do with capability per se. Other languages do provide the ability to return arrays, and you already know that in C you can return a struct with an array member. On the other hand, yet other languages have the same limitation that C does, and even more so. Java, for instance, cannot return arrays, nor indeed objects of any type, from methods. It can return only primitives and references to objects.
No, it is simply a question of language design. As with most other things to do with arrays, the design points here revolve around C's provision that expressions of array type are automatically converted to pointers in almost all contexts. The value provided in a return statement is no exception, so C has no way of even expressing the return of an array itself. A different choice could have been made, but it simply wasn't.
For arrays to be first-class objects, you would expect at least to be able to assign them. But that requires knowledge of the size, and the C type system is not powerful enough to attach sizes to any types. C++ could do it, but doesn't due to legacy concerns—it has references to arrays of particular size (typedef char (&some_chars)[32]), but plain arrays are still implicitly converted to pointers as in C. C++ has std::array instead, which is basically the aforementioned array-within-struct plus some syntactic sugar.
Bounty hunting.
The authors of C did not aspire to be language or type system designers. They were tool designers. C was a tool to make system programming easier.
ref: B Kernighan on Pascal Ritchie on C
There was no compelling case for C to do anything unexpected; especially as UNIX and C were ushering in the era of least surprise. Copying arrays, and making complex syntax to do so when it was the metaphorical equivalent of having a setting to burn the toast did not fit the C model.
Everything in C, the language, is effectively constant time, constant size. C, the standard, seems preoccupied with doing away with this core feature which made C so popular; so expect the, uh, standard C/2023.feb07 to feature a punctuation nightmare that enables arrays as r-values.
The decision of the C authors makes eminent sense if you view the programming world pragmatically. If you view it as a pulpit for treasured beliefs, then get onboard for C/2023.feb07 before C/2023.feb08 nullifies it.
I'm afraid in my mind it's not so much a debate of first or second class objects, it's a religious discussion of good practice and applicable practice for deep embedded applications.
Returning a structure either means a root structure being changed by stealth in the depths of the call sequence, or a duplication of data and the passing of large chunks of duplicated data. The main applications of C are still largely concentrated around the deep embedded applications. In these domains you have small processors that don't need to be passing large blocks of data. You also have engineering practice that necessitates the need to be able to operate without dynamic RAM allocation, and with minimal stack and often no heap. It could be argued the return of the structure is the same as modification via pointer, but abstracted in syntax... I'm afraid I'd argue that's not in the C philosophy of "what you see is what you get" in the same way a pointer to a type is.
Personally, I would argue you have found a loop hole, whether standard approved or not. C is designed in such a way that allocation is explicit. You pass as a matter of good practice address bus sized objects, normally in an aspirational one cycle, referring to memory that has been allocated explicitly at a controlled time within the developers ken. This makes sense in terms of code efficiency, cycle efficiency, and offers the most control and clarity of purpose. I'm afraid, in code inspection I'd throw out a function returning a structure as bad practice. C does not enforce many rules, it's a language for professional engineers in many ways as it relies upon the user enforcing their own discipline. Just because you can, doesn't mean you should... It does offer some pretty bullet proof ways to handle data of very complex size and type utilising compile time rigour and minimising the dynamic variations of footprint and at runtime.

Assigning a zero to all array elements in C

In the program I'm working on, this particular operation is definitely not going to be the bottleneck, but it did get me thinking. From the answers to questions such as this one and this one I've learned two ways to easily (efficiently) set all the elements of an array to zero in C:
double myArray[3];
static const double zeroes[3] = {0};
memcpy(myArray, zeroes, sizeof(zeroes));
and
double myArray[3];
memset(myArray, 0, numberOfElementsInMyArray * sizeof(myArray[0]));
Before I move onto my real question: I'm not entirely sure but based on the information I've read, I assume this method would, at least in principle, fill the array with int zeroes (well, unsigned char's but these seem to be fairly equivalent). Is that correct? If so, is an explicit conversion of the int zeroes to double zeroes necessary or is it done implicitly if myArray is declared as an array of double's?
Anyway, my real question is this: if the array isn't very big at all (like the myArray I've declared above), is either of these methods still preferred over a little loop? What if you have a few arrays of the same small size that all need to be assigned zeroes? If commented properly, do you think readability is a factor in the decision and favours a particular solution?
Just to be entirely clear: I am not looking to initialize an array to zeroes.
If it's just a small array (like three elements), it probably won't make much difference whether you use mem* functions, or a loop, or three distinct assignments. In fact, that latter case may even be faster as you're not suffering the cost of a function call:
myArry[0] = myArray[1] = myArray[2] = 0;
But, even if one is faster, the difference would probably not be worth worrying about. I tend to optimise for readability first then, if needed, optimise for space/storage later.
If it was a choice between memcpy and memset, I'd choose the latter (assuming, as seems to be the case, that the all-zero bit pattern actually represented 0.0 in your implementation) for two reasons:
it doesn't require storage of a zeroed array; and
the former will get you into trouble if you change the size of one array and forget the other.
And, for what it's worth, your memset solution doesn't need to have the multiplication. Since you can get the size of the entire array, you can just do:
memset (myArray, 0, sizeof (myArray));
i think the first method of setting without using a loop is better for performance
what happen is that the merroy of array is bitwised by 0 (& 0) so it faster than using a loop for each element in the array.

Why isn't there an array length function in C?

While there are various ways to find the length of an array in C, the language doesn't provide one.
What was the reason for not including such a common operation in C or any of its revisions?
One of the guiding philosophies of C's design is that all data types map directly to memory, and attempting to store metadata for array types such as length runs counter to that philosophy.
From an article by Dennis Ritchie describing the development of C, we find this:
Embryonic C
...
These semantics represented an easy transition from B, and I experimented with them for some months. Problems became evident when I tried to extend the type notation, especially to add structured (record) types. Structures, it seemed, should map in an intuitive way onto memory in the machine, but in a structure containing an array, there was no good place to stash the pointer containing the base of the array, nor any convenient way to arrange that it be initialized. For example, the directory entries of early Unix systems might be described in C as
struct {
int inumber;
char name[14];
};
I wanted the structure not merely to characterize an abstract object but also to describe a collection of bits that might be read from a directory. Where could the compiler hide the pointer to name that the semantics demanded? Even if structures were thought of more abstractly, and the space for pointers could be hidden somehow, how could I handle the technical problem of properly initializing these pointers when allocating a complicated object, perhaps one that specified structures containing arrays containing structures to arbitrary depth?
The solution constituted the crucial jump in the evolutionary chain between typeless BCPL and typed C. It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today's C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.
Emphasis mine. Just replace the term "pointer" with "metadata" in the passage above, and I think we have the answer to your question.
Unless someone here happens to be on the C standard committee, you're unlikely to get an authoritative answer. But two reasons I can think of:
In many (most?) situations, you don't have an array, you just have a pointer.
Storing metadata about the array increases the storage size, etc. The general rule of C is that you don't pay for what you don't use.
C is not object-oriented, so it has no concept of methods that are attached to objects. It was designed with speed and simplicity in mind, and the common idiom sizeof (array) / sizeof(array[0]) is short and straightforward.
It is down to efficiency. C is a very efficient programing language.
Array syntax in C is just a syntactic sugar for pointer arithmetic. If you want to have a real array with length and bounds checking, you can create a struct which contains an array pointer and its length and access it only through functions which check bounds.

What is faster in C: structs or arrays?

I want to implement (what represents abstractly) a two dimensional 4x4 matrix. All the code I write for matrix multiplication et cetera will be entirely "unrolled" as it were -- that is to say, I will not be using loops to access and write data entries in the matrix.
My question is: In C, would it be faster to use a struct as such:
typedef struct {
double e0, e1, e2, e3, e4, ..., e15
} My4x4Matrix;
Or would this be faster:
typedef double My4x4Matrix[16];
Given that I will be accessing each matrix element individually as such:
My4x4Matrix a,b,c;
// (Some initialization of a and b.)
...
c.e0=a.e0+b.e0;
c.e1=a.e1+b.e1;
...
Or
My4x4Matrix a,b,c;
// (Some initialization of a and b.)
...
c[0]=a[0]+b[0];
c[1]=a[1]+b[1];
...
Or are they exactly the same speed?
Any decent compiler will generate the exact same code, byte-for-byte. However, using arrays allows you a lot more flexibility; when accessing the matrix elements, you can choose whether you want to access fixed locations or address positions with variables.
I also highly question your choice to "unwind" (unroll?) all the operations by hand. Any good compiler can fully unroll loops with a constant number of iterations for you, and can perhaps even generate SIMD code and/or optimally schedule the order of instructions. You'll have a hard time doing better by hand, and you'll end up with code that's hideous to read. The fact that you asked this question suggests to me that you're probably not sufficiently experienced to do better than even a naive optimizing compiler.
Struct elements (fields) can only be accessed by their names explicitly specified in the program's source, which means that every time you access a field the actual field must be selected and hardcoded at compile time. If you wanted to implement the same thing with arrays, that would mean that you would use explicit constant compile-time array indices (as in your example). In this case the performance of the two will be exactly the same and the code generated will be exactly the same (excluding from consideration "malicious" compilers).
However, note that arrays provide you with an extra degree of freedom: if necessary, you can select array elements by a run-time index. This is something that's not possible with structs. Only you know whether it matters to you.
On the other hand, note also that arrays in C are not copyable, which means that you'll be forced to use memcpy to copy your array-based My4x4Matrix. With struct-based version normal language-level copying will work. With arrays this issue can be worked around by wrapping the actual array in a struct.
I guess both are the same speed. The difference between a struct and an array is just its meaning (in human terms.) Both will be compiled as memory addresses.
I would say the best way is to create a test to try it yourself. Results may vary based on system environments and compilers.

Why can I assign structs but not compare them

Even though I am a long time C programmer, I only recently learned that one can directly assign structure variables to one another instead of using memcpy:
struct MyStruct a,b;
...
a = b; /* implicit memcpy */
Though this feels a bit "high-level" for C, it is definitely useful. But why can't I do equality and inequality comparison:
if (a == b) ...
if (a != b) ...
Is there any good reason for the standard to exclude this? Or is this an inconsistency in the - otherwise very elegant - standard?
I don't see why I can replace my memcpy's for clean assignments, but I have to keep those ugly memcmp's in place.
Per the comp.lang.c FAQ:
There is no good way for a compiler
to implement structure comparison
(i.e. to support the == operator for
structures) which is consistent with
C's low-level flavor. A simple
byte-by-byte comparison could founder
on random bits present in unused
"holes" in the structure (such
padding is used to keep the alignment
of later fields correct). A field-by-field comparison might require unacceptable amounts of
repetitive code for large structures.
Any compiler-generated comparison
could not be expected to compare
pointer fields appropriately in all
cases: for example, it's often
appropriate to compare char * fields
with strcmp rather than ==.
If you need to compare two structures,
you'll have to write your own function
to do so, field by field.

Resources