Fastest way to copy an array - Does it have something questionable? - c

When working with arrays of the same length, consider creating a structure which only contains an array so that it is easier to copy the array by simply copying the structure into another one.
The definition and declaration of the structure would be like this:
typedef struct {
char array[X]; /* X is an arbitrary constant */
} Array;
Array array1;
Then, perform the copy by simply doing:
Array array2;
array2 = array1;
I have found that this as the fastest way of copying an array. Does it have any disadvantage?
Edit: X is an arbitrary constant, let's say 10. The array is not a variable length array.

X may be arbitrary (up until the limits of your compile environment), but it's (naturally) the same number for all objects of type Array. If you need many arrays of same length and type to be copied to each other, there's nothing inherently wrong about this, although accessing these arrays might be more cumbersome than usual.
Array array1;
array1.array[0] // to access the first element

This works fine, but since X must be defined at compile-time it is rather inflexible. You can get the same performance over an array of any length by using memcpy instead. In fact, compilers will usually translate array2 = array1 into a call to memcpy.
As mentioned in the comments, whether direct assignment gets translated into a memcpy call depends on the size of the array amongst other compiler heuristics.

It's entirely dependent on the compiler and level of optimizations you use. The compiler "should" reduce it to memcpy with a constant size, which should in turn be reduced to whatever machine specific operations exist for copying various size blocks of memory. Small blocks should be highly machine specific these days. Actually calling the memcpy library function to copy 4 bytes would be so "10 years ago."
With optimizations off all performance bets are off.

Related

Dynamic vs dynamically created arrays in C

I'm in a Programming I class and this is an excerpt from my textbook:
"There are two basic ways to create an array, statically and dynamically. Note that a dynamically created array is not the same thing as a dynamic array; a dynamically created array can only be fixed-size in C. "
My professor is saying things that pretty directly contradict this quote, and is being very evasive when I ask further questions. He doesn't seem to acknowledge that there is a difference between dynamic vs fixed-size and dynamically-created vs. statically-created. I don't know enough about C to argue with him and since he's the one who wrote the textbook, I'm a little lost at the moment.
What is the difference between statically-created vs. dynamically-created and dynamic vs. static arrays?
Do "dynamic" (not dynamically-created) arrays exist in C?
The textbook is "The Art and Craft of Programming: C Edition" by John Lusth. Actually I was wrong about my professor being the one who wrote it, the author is a different CS professor at my school.
When the professor uses word dynamic it means that an array can change its size on the fly. That is new elements can be added to or deleted from the array.
A dynamically allocated array means the allocation of an array at run-time in the heap. Statically allocated arrays are allocated before the main function gets the control.
Take into account that C has Variable Length Arrays (VLA). Bit it is not the same as dynamic arrays. VLA means that an array may be recreated with different sizes. But in each such recreation you create a new array.
An example of a dynamic array is standard C++ class std::vector.
The answer to this question will depend on how pedantially one wants to treat terms like "array" and "dynamic". Is "array" supposed to refer exclusibely to array types? Or are we allowed to include malloc-ed arrays as well, accessible through pointers? Does "dynamic" refer to dynamic memory (even though the standard C nomenclature does not use this term)? Or are we allowed to consider local VLAs as "dynamic" as well?
Anyway, one can separate arrays into three conceptual categories
Arrays with compile-time size
Arrays with run-time initial size, which cannot be resized
Arrays with run-time initial size, which can be resized
Apparently, your professor referred to the second category as "dynamically created arrays" and to the third category as "dynamic" arrays.
For example, arrays from the first category are the classic built-in C89/90-style C arrays
int a[10];
Arrays from the second category are C99 VLAs
int a[n];
(or new-ed arrays in C++).
Arrays from the third category are arrays allocated with malloc
int *a = malloc(n * sizeof *a);
which can be later resized with realloc.
Of course, once we step beyond the built-in features of the core language, the division between these categories becomes purely conceptual. It is just a matter of the interface the array implementation offers to the user. For example, it is possible to implement arrays from the third category through arrays of the second category, by destroying the old fixed-size array and allocating a new one of different size. In fact, that is how realloc is allowed to work in general case.
The issue here is that the terminology is not formally defined, so when different people use these words, they may mean different things.
I think your textbook is distinguishing between these three definitions:
Static array
An array whose size is hard-coded into the source:
int ages[100];
ages[0] = 1;
Disadvantage: you have to know how big to make it, when you code.
Advantage: the runtime automatically takes back the storage when the variable goes out of scope.
Dynamically allocated array
An array whose size is decided at runtime, before creating the array.
numberOfAges = (some calculation);
int *ages = (int*) malloc(numberOfAges);
ages[0] = 1;
In this case, the size of the array isn't known at compile-time. It is decided at runtime, but once the array has been created, its size cannot change.
Advantage: You can make it different sizes depending on runtime requirements.
Disadvantage: You have to make your own code call free() to reclaim the storage.
Dynamic arrays
This is an array whose size grows or shrinks during its lifespan.
A hypothetical language might have statements like:
resize(ages, 5); // make "ages" 5 longer
truncate(ages, 3); // make "ages" 3 long, discarding later elements.
What your professor is saying, correctly, is that the core of C does not have arrays that can do this. A char* is a fixed size at the point it's allocated, and that can never change.
In C, if you want a list of values whose length grows or shrinks, you have to roll your own code to do it, or use a library of functions that provides it. Rather than working directly with arrays, you'd work with the API provided by the library. Indeed, it might look very much like the hypothetical example above, except that ages would not be an int* - it would be some type provided by the library.
#include <resizeableIntArrays.h>
...
ResizeableIntArray ages = allocateResizeableIntArray(100);
resize(ages,80);
There are lots of ways to achieve this - using realloc(), using linked lists or binary trees, using linked lists of arrays. I suspect when your professor is "evasive", he's really saving the more complicated stuff until later. Any respectable C course will get to this stuff eventually, but if ordinary arrays are new to you, it'll be a few weeks before you're ready for linked lists.
statically created arrays are those whose size u give as a constant during the declaration, like
int arr[10];
dynamically created arrays are those whose size is given as a variable. as variables can take any value during the array declaration, size depends on the variable value at that program instance, like
int n;
scanf("%d",&n);
int arr[n];
for your second question: dynamic arrays (arrays that change their size during program execution) do not exist in C. u can create a link list with dynamic memory allocation, which would be in essence a dynamic linear data structure (in contrast to a static linear data structure that array is).
Dynamically created array means array is created at run time. Either it would be a variable length arrays
int n = 5;
int a[n];
or created using malloc
int *a;
a = malloc(sizeof(int) * 5); // But note that pointers are not arrays
In C, there is no dynamic arrays. You can't add or delete new elements to it, i.e. once an array is created its size can't be changed.

How does a C program get information from an array internally?

I'm relatively new to programming, so when someone suggested that building an array of structs (each containing n attributes of a particular "item") was faster than building n arrays of attributes, I found that I didn't know enough about arrays to argue one way or the other.
I read this:
how do arrays work internally in c/c++
and
Basic Arrays Tutorial
But I still don't really understand how a C program retrieves a particular value from an array by index.
It seems pretty clear that data elements of the array are stored adjacent in memory, and that the array name points to the first element.
Are C programs smart enough to do the arithmetic based on data-type and index to figure out the exact memory address of the target data, or does the program have to somehow iterate over every intermediary piece of data before it gets there (as in the linkedlist data structure)?
More fundamentally, if a program asks for a piece of information by its memory address, how does the machine find it?
Let's take a simpler example. Let's say you have a array int test[10] which is stored like this at address 1000:
1|2|3|4|5|6|7|8|9|10
The complier knows that, for example, an int is 4 bytes. The array access formula is this:
baseaddr + sizeof(type) * index
The size of a struct is just the sum of the sizes of its elements plus any padding added by the compiler. So the size of this struct:
struct test {
int i;
char c;
}
Might be 5. It also might not be, because of padding.
As for your last question, very shortly (this is very complicated) the MMU uses the page table to translate the virtual address to a physical address, which is then requested, which if it's in cache, is returned, and otherwise is fetched from main memory.
You wrote:
Are C programs smart enough to do the arithmetic based on data-type and index to figure out the exact memory address of the target data
Yes, that is exactly what they do. They do not iterate over intervening items (and doing so would not help since there are no markers to guide beginning and end of each item).
so here is the whole trick, array elements are adjacent in memory.
when you declare an array for example: int A[10];
the variable A is a pointer to the first element in the array.
now comes the indexing part, whenever you do A[i] it is exactly as if you were doing *(A+i).
the index is just an offset to the beginning address of the array, also keep in mind that in pointer arithmetic the offset is multiplied by the size of the data type of the array.
to get a better understanding of this write a little code, declare an array and print the address of it and then of each element in the array.
and notice how the offset is always the same and equal to the size of the data type of your array on your machine.

Variable Length Array

I would like to know how a variable length array is managed (what extra variables or data structures are kept on the stack in order to have variable length arrays).
Thanks a lot.
It's just a dynamically sized array (implementation-dependent, but most commonly on the stack). It's pretty much like alloca in the old days, with the exception that sizeof will return the actual size of the array, which implies that the size of the array must also be stored somewhere (implementation-dependent as well, but probably on the stack too).
The size of variable length arrays is determined on run-time, instead of compilation time.
The way it's managed depends on the compiler.
GCC, for instance, allocates memory on the stack.But there is no special structure. It's just a normal array, whose size is known at run-time.
alternatively you could use some containers, e.g. ArrayList in java or vector in c/c++

Why does C need arrays if it has pointers?

If we can use pointers and malloc to create and use arrays, why does the array type exist in C? Isn't it unnecessary if we can use pointers instead?
Arrays are faster than dynamic memory allocation.
Arrays are "allocated" at "compile time" whereas malloc allocates at run time. Allocating takes time.
Also, C does not mandate that malloc() and friends are available in free-standing implementations.
Edit
Example of array
#define DECK_SIZE 52
int main(void) {
int deck[DECK_SIZE];
play(deck, DECK_SIZE);
return 0;
}
Example of malloc()
int main(void) {
size_t len = 52;
int *deck = malloc(len * sizeof *deck);
if (deck) {
play(deck, len);
}
free(deck);
return 0;
}
In the array version, the space for the deck array was reserved by the compiler when the program was created (but, of course, the memory is only reserved/occupied when the program is being run), in the malloc() version, space for the deck array has to be requested at every run of the program.
Arrays can never change size, malloc'd memory can grow when needed.
If you only need a fixed number of elements, use an array (within the limits of your implementation).
If you need memory that can grow or shrink during the running of the program, use malloc() and friends.
It's not a bad question. In fact, early C had no array types.
Global and static arrays are allocated at compile time (very fast). Other arrays are allocated on the stack at runtime (fast). Allocating memory with malloc (to be used for an array or otherwise) is much slower. A similar thing is seen in deallocation: dynamically allocated memory is slower to deallocate.
Speed is not the only issue. Array types are automatically deallocated when they go out of scope, so they cannot be "leaked" by mistake. You don't need to worry about accidentally freeing something twice, and so on. They also make it easier for static analysis tools to detect bugs.
You may argue that there is the function _alloca() which lets you allocate memory from the stack. Yes, there is no technical reason why arrays are needed over _alloca(). However, I think arrays are more convenient to use. Also, it is easier for the compiler to optimise the use of an array than a pointer with an _alloca() return value in it, since it's obvious what a stack-allocated array's offset from the stack pointer is, whereas if _alloca() is treated like a black-box function call, the compiler can't tell this value in advance.
EDIT, since tsubasa has asked for more details on how this allocation occurs:
On x86 architectures, the ebp register normally refers to the current function's stack frame, and is used to reference stack-allocated variables. For instance, you may have an int located at [ebp - 8] and a char array stretching from [ebp - 24] to [ebp - 9]. And perhaps more variables and arrays on the stack. (The compiler decides how to use the stack frame at compile time. C99 compilers allow variable-size arrays to be stack allocated, this is just a matter of doing a tiny bit of work at runtime.)
In x86 code, pointer offsets (such as [ebp - 16]) can be represented in a single instruction. Pretty efficient.
Now, an important point is that all stack-allocated variables and arrays in the current context are retrieved via offsets from a single register. If you call malloc there is (as I have said) some processing overhead in actually finding some memory for you. But also, malloc gives you a new memory address. Let's say it is stored in the ebx register. You can't use an offset from ebp anymore, because you can't tell what that offset will be at compile time. So you are basically "wasting" an extra register that you would not need if you used a normal array instead. If you malloc more arrays, you have more "unpredictable" pointer values that magnify this problem.
Arrays have their uses, and should be used when you can, as static allocation will help make programs more stable, and are a necessity at times due to the need to ensure memory leaks don't happen.
They exist because some requirements require them.
In a language such as BASIC, you have certain commands that are allowed, and this is known, due to the language construct. So, what is the benefit of using malloc to create the arrays, and then fill them in from strings?
If I have to define the names of the operations anyway, why not put them into an array?
C was written as a general purpose language, which means that it should be useful in any situation, so they had to ensure that it had the constructs to be useful for writing operating systems as well as embedded systems.
An array is a shorthand way to specify pointing to the beginning of a malloc for example.
But, imagine trying to do matrix math by using pointer manipulations rather than vec[x] * vec[y]. It would be very prone to difficult to find errors.
See this question discussing space hardening and C. Sometimes dynamic memory allocation is just a bad idea, I have worked with C libraries that are completely devoid of malloc() and friends.
You don't want a satellite dereferencing a NULL pointer any more than you want air traffic control software forgetting to zero out heap blocks.
Its also important (as others have pointed out) to understand what is part of C and what extends it into various uniform standards (i.e. POSIX).
Arrays are a nice syntax improvement compared to dealing with pointers. You can make all sorts of mistakes unknowingly when dealing with pointers. What if you move too many spaces across the memory because you're using the wrong byte size?
Explanation by Dennis Ritchie about C history:
Embryonic C
NB existed so briefly that no full description of it was written. It supplied the types int and char, arrays of them, and pointers to them, declared in a style typified by
int i, j;
char c, d;
int iarray[10];
int ipointer[];
char carray[10];
char cpointer[];
The semantics of arrays remained exactly as in B and BCPL: the declarations of iarray and carray create cells dynamically initialized with a value pointing to the first of a sequence of 10 integers and characters respectively. The declarations for ipointer and cpointer omit the size, to assert that no storage should be allocated automatically. Within procedures, the language's interpretation of the pointers was identical to that of the array variables: a pointer declaration created a cell differing from an array declaration only in that the programmer was expected to assign a referent, instead of letting the compiler allocate the space and initialize the cell.
Values stored in the cells bound to array and pointer names were the machine addresses, measured in bytes, of the corresponding storage area. Therefore, indirection through a pointer implied no run-time overhead to scale the pointer from word to byte offset. On the other hand, the machine code for array subscripting and pointer arithmetic now depended on the type of the array or the pointer: to compute iarray[i] or ipointer+i implied scaling the addend i by the size of the object referred to.
These semantics represented an easy transition from B, and I experimented with them for some months. Problems became evident when I tried to extend the type notation, especially to add structured (record) types. Structures, it seemed, should map in an intuitive way onto memory in the machine, but in a structure containing an array, there was no good place to stash the pointer containing the base of the array, nor any convenient way to arrange that it be initialized. For example, the directory entries of early Unix systems might be described in C as
struct {
int inumber;
char name[14];
};
I wanted the structure not merely to characterize an abstract object but also to describe a collection of bits that might be read from a directory. Where could the compiler hide the pointer to name that the semantics demanded? Even if structures were thought of more abstractly, and the space for pointers could be hidden somehow, how could I handle the technical problem of properly initializing these pointers when allocating a complicated object, perhaps one that specified structures containing arrays containing structures to arbitrary depth?
The solution constituted the crucial jump in the evolutionary chain between typeless BCPL and typed C. It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today's C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.
To summarize in my own words - if name above were just a pointer, any of that struct would contain an additional pointer, destroying the perfect mapping of it to an external object (like an directory entry).

Is it a best practice to wrap arrays and their length variable in a struct in C?

I will begin to use C for an Operating Systems course soon and I'm reading up on best practices on using C so that headaches are reduced later on.
This has always been one my first questions regarding arrays as they are easy to screw up.
Is it a common practice out there to bundle an array and its associated variable containing it's length in a struct?
I've never seen it in books and usually they always keep the two separate or use something like sizeof(array[]/array[1]) kind of deal.
But with wrapping the two into a struct, you'd be able to pass the struct both by value and by reference which you can't really do with arrays unless using pointers, in which case you have to again keep track of the array length.
I am beginning to use C so the above could be horribly wrong, I am still a student.
Cheers,
Kai.
Yes this is a great practice to have in C. It's completely logical to wrap related values into a containing structure.
I would go ever further. It would also serve your purpose to never modify these values directly. Instead write functions which act on these values as a pair inside your struct to change length and alter data. That way you can add invariant checks and also make it very easy to test.
Sure, you can do that. Not sure if I'd call it a best practice, but it's certainly a good idea to make C's rather rudimentary arrays a bit more manageable. If you need dynamic arrays, it's almost a requirement to group the various fields needed to do the bookkeeping together.
Sometimes you have two sizes in that case: one current, and one allocated. This is a tradeoff where you trade fewer allocations for some speed, paying with a bit of memory overhead.
Many times arrays are only used locally, and are of static size, which is why the sizeof operator is so handy to determine the number of elements. Your syntax is slightly off with that, by the way, here's how it usually looks:
int array[4711];
int i;
for(i = 0; i < sizeof array / sizeof *array; i++)
{
/* Do stuff with each element. */
}
Remember that sizeof is not a function, the parenthesis are not always needed.
EDIT: One real-world example of a wrapping exactly as that which you describe is the GArray type provided by glib. The user-visible part of the declaration is exactly what you describe:
typedef struct {
gchar *data;
guint len;
} GArray;
Programs are expected to use the provided API to access the array whenever possible, not poke these fields directly.
There are three ways.
For static array (not dynamically allocated and not passed as pointer) size is knows at compile time so you can used sizeof operator, like this: sizeof(array)/sizeof(array[0])
Use terminator (special value for last array element which cannot be used as regular array value), like null-terminated strings
Use separate value, either as a struct member or independent variable. It doesn't really matter because all the standard functions that work with arrays take separate size variable, however joining the array pointer and size into one struct will increase code readability. I suggest to use to have a cleaner interface for your own functions. Please note that if you pass your struct by value, called function will be able to change the array, but not the size variable, so passing struct pointer would be a better option.
For public API I'd go with the array and the size value separated. That's how it is handled in most (if not all) c library I know. How you handle it internally it's completely up to you. So using a structure plus some helper functions/macros that do the tricky parts for you is a good idea. It's always making me head-ache to re-think how to insert an item or to remove one, so it's a good source of problems. Solving that once and generic, helps you getting bugs from the beginning low. A nice implementation for dynamic and generic arrays is kvec.
I'd say it's a good practice. In fact, it's good enough that in C++ they've put it into the standard library and called it vector. Whenever you talk about arrays in a C++ forum, you'll get inundated with responses that say to use vector instead.
I don't see anything wrong with doing that but I think the reason that that is not usually done is because of the overhead incurred by such a structure. Most C is bare-metal code for performance reasons and as such, abstractions are often avoided.
I haven't seen it done in books much either, but I've been doing the same the same thing for a while now. It just seems to make sense to "package" those things together. I find it especially useful if you need to return an allocated array from a method for instance.
If you use static arrays you have access to the size of array using sizeof operator. If you'll put it into struct, you can pass it to function by value, reference and pointer. Passing argument by reference and by pointer is the same on assembly level (I'm almost sure of it).
But if you use dynamic arrays, you don't know the size of array at compile time. So you can store this value in struct, but you will also store only a pointer to array in structure:
struct Foo {
int *myarray;
int size;
};
So you can pass this structure by value, but what you realy do is passing pointer to int (pointer to array) and int (size of array).
In my opinion it won't help you much. The only thing that is in plus, is that you store the size and the array in one place and it is easy to get the size of the array. If you will use a lot of dynamic arrays you can do it this way. But if you will use few arrays, easier will be not to use structures.
I've never seen it done that way, but I haven't done OS level work in over a decade... :-) Seems like a reasonable approach at first glance. Only concern would be to make sure that the size somehow stays accurate... Calculating as needed doesn't have that concern.
considering you can calculate the length of the array (in bytes, that is, not # of elements) and the compiler will replace the sizeof() calls with the actual value (its not a function, calls to it are replaced by the compiler with the value it 'returns'), then the only reason you'd want to wrap it in a struct is for readability.
It isn't a common practice, if you did it, someone looking at your code would assume the length field was some special value, and not just the array size. That's a danger lurking in your proposal, so you'd have to be careful to comment it properly.
I think this part of your question is backwards:
"But with wrapping the two into a struct, you'd be able to pass the struct both by value and by reference which you can't really do with arrays unless using pointers, in which case you have to again keep track of the array length."
Using pointers when passing arrays around is the default behavior; that doesn't buy you anything in terms of passing the entire array by value. If you want to copy the entire array instead of having it decay to a pointer you NEED to wrap the array in a struct. See this answer for more information:
Pass array by value to recursive function possible?
And here is more on the special behavior of arrays:
http://denniskubes.com/2012/08/20/is-c-pass-by-value-or-reference/

Resources