In C, the idea of an array is very straightforward—simply a pointer to the first element in a row of elements in memory, which can be accessed via pointer arithmetic/ the standard array[i] syntax.
However, in languages like Google Go, "arrays are values", not pointers. What does that mean? How is it implemented?
In most cases they're the same as C arrays, but the compiler/interpreter hides the pointer from you. This is mainly because then the array can be relocated in memory in a totally transparent way, and so such arrays appear to have an ability to be resized.
On the other hand it is safer, because without a possibility to move the pointers you cannot make a leak.
Since then (2010), the article Slices: usage and internals is a bit more precise:
The in-memory representation of [4]int is just four integer values laid out sequentially:
Go's arrays are values.
An array variable denotes the entire array; it is not a pointer to the first array element (as would be the case in C).
This means that when you assign or pass around an array value you will make a copy of its contents. (To avoid the copy you could pass a pointer to the array, but then that's a pointer to an array, not an array.)
One way to think about arrays is as a sort of struct but with indexed rather than named fields: a fixed-size composite value.
Arrays in Go are also values in that they are passed as values to functions(in the same way ints,strings,floats etc.)
Which requires copying the whole array for each function call.
This can be very slow for a large array, which is why in most cases it's usually better to use slices
Related
So I want to create some functions that take arrays as input, but I don't know how many dimensions the array will have.
Is there a way in c to determine how many dimensions an array has? (ideally as a function)
So this is actually a fairly difficult problem in C. Usually this is solved using one of three ways:
Have a special terminating value, like '\0' for strings
Pass the dimensions of the array as a parameter to whatever function you're using
Keep the pointer of the array and its dimension in a struct together
If I remember correctly, there's a way to figure out the size of an allocated array using *nix systems, but I definitely would not recommend doing this. Just keep track of your allocated memory.
As I continue learning the C language I got a doubt. Which are the differences between using an array in which each element is an struct and using an array in which each element is a pointer to the same type of struct. It seems to me that you can use both equally (Although in the pointers one you have to deal with memory allocation). Can somebody explain me in which case it is better to use one or the other?
Thank you.
Arrays of structures and arrays of pointers to structures are different ways to organize memory.
Arrays of structures have these strong points:
it is easy to allocate such an array dynamically in one step with struct s *p = calloc(n, sizeof(*p));.
if the array is part of an enclosing structure, no separate allocation code is needed at all. The same is true for local and global arrays.
the array is a contiguous block of memory, a pointer to the next and previous elements can be easily computed as struct s *prev = p - 1, *next = p + 1;
accessing array element members may be faster as they are close in memory, increasing cache efficiency.
They also have disadvantages:
the size of the array must be passed explicitly as there is no way to tell from the pointer to the array how many elements it has.
the expression p[i].member generates a multiplication, which may be costly on some architectures if the size of the structure is not a power of 2.
changing the order of elements is costly as it may involve copying large amounts of memory.
Using an array of pointers has these advantages:
the size of the array could be determined by allocating an extra element and setting it to NULL. This convention is used for the argv[] array of command line arguments provided to the main() function.
if the above convention is not used, and the number of elements is passed separately, NULL pointer values could be used to specify missing elements.
it is easy to change the order of elements by just moving the pointers.
multiple elements could be made to point to the same structure.
reallocating the array is easier as only the array of pointers needs reallocation, optionally keeping separate length and size counts to minimize reallocations. Incremental allocation is easy too.
the expression p[i].member generates a simple shift and an extra memory access, but may be more efficient than the equivalent expression for arrays of structures.
and the following drawbacks:
allocating and freeing this indirect array is more cumbersome. An extra loop is required to allocate and/or initialize the structures pointed to by the array.
access to structure elements involve an extra memory indirection. Compilers can generate efficient code for this if multiple members are accessed in the same function, but not always.
pointers to adjacent structures cannot be derived from a pointer to a given element.
EDIT: As hinted by David Bowling, one can combine some of the advantages of both approaches by allocating an array of structures on one hand and a separate array of pointers pointing to the elements of the first array. This is a handy way to implement a sort order, or even multiple concomitant sort orders with separate arrays of pointers, like database indexes.
To motivate my question, consider the case when dealing with jagged arrays (for simplicity's sake) of element type Int in Julia. There are two ways to store them:
As Vector{Vector{Int}}
As Vector{Union{Vector{Int}, Int}} (especially, if one expects to store a sufficiently large number of 1-element vectors)
My question is which one is more efficient / faster / better?
To answer it, among the other things, I need to know how each is stored in memory. Namely:
I presume that variable of a type Vector{Vector{Int}}, would be considered homogeneous type array, and therefore I would expect it to be stored contiguously in memory, and as such to be more cpu-cache-friendly. Am I right? Or contiguity only applies to arrays whose elements' data type is primitive?
Would variable of a type Vector{Union{Vector{Int}, Int}} considered heterogeneous array, and as such stored not contiguously in memory?
How benefit of contiguous representation in memory is compared to the benefit of not having array container for 1-element arrays members, i.e. storing them as primitive data type (Int in this case)? Which one yields more efficiency?
Julia's arrays will only store elements of type T unboxed if isbits(T) is true. That is, the elements must be both immutable and pointer-free. An easy way to see if the elements are being stored immediately is by allocating an uninitialized array. Contiguous arrays of unboxed (immediate) values will have gibberish:
julia> Array(Int, 3)
3-element Array{Int64,1}:
4430901168
4470602000
4430901232
whereas arrays of non-isbits types will have #undef pointers:
julia> Array(Vector{Int}, 3)
3-element Array{Array{Int64,1},1}:
#undef
#undef
#undef
Imagine what would happen if the latter returned one contiguous chunk of Ints. How would it know how big to make it? Or where one vector stopped and the next began? It would depend upon the sizes of the vectors, which isn't known yet.
A Vector{Union{Vector{Int}, Int}} will similarly store its elements as pointers; this time it's because Julia doesn't know how to interpret each element inline (should it read the memory like an integer or like an array?). It has the additional disadvantage that Julia no longer knows what type it'll return from indexing. This is a type-instability, and will certainly be much worse for performance than just using one-element vectors.
It is possible to create your own ragged array type that stores its elements inline, but it's very tricky to make it work with the standard library like a normal array since it breaks a lot of assumptions about how indexing works. You can take a look at my latest attempt: RaggedArrays.jl. You can see how I compare it to previous efforts in Issue#2.
In C, a struct (record data structure) can be the return type of a function, but an array cannot be. What design characteristics of the C Language cause arrays to be an exception?
A naked array type in C language is not copyable for primarily historical reasons. For this reason it is not possible to initialize arrays with arrays, assign arrays to arrays, pass arrays by value as parameters or return arrays from functions. (Initialization context has a notable exception of char s[6] = "Hello";.)
It is still possible to do all the above if the array is wrapped in struct type, which demonstrates that the limitation is purely declarative in nature. There's no compelling technical reason for it.
C language inherited its approach to array implementation from its historical predecessors - B and BCPL languages. In B/BCPL arrays were openly implemented as pointers, meaning that an attempt to assign one array to another actually represented assignment of pointers. C language followed a different approach. In C arrays are not pointers, but the interface specification of C arrays is kept superficially compatible with that of B/BCPL. Arrays in C still "pretend" to be pointers in most contexts. This is one reason they are not immediately copyable.
Most obviously, the "lack" is that C doesn't permit a function to return a result of an array type. This is stated explicitly in the language standard.
Array types are, in a sense, second-class citizens in C. In most contexts, an expression of array type is implicitly converted to a pointer to its first element. The exceptions are when the array expression is the operand of sizeof (which yields the size of the array), when it's the operand of unary & (which yields the address of the array), and when it's a string literal in an initializer used to initialize an array object.
This absolutely does not mean that arrays are "really" pointers; they're not. You'll see people claiming that they are. They're wrong.
Functions return values. You can have a value of a structure type; that value consists of the values of its members. C permits assignment, parameter passing, and function results of structure type. All these manipulate array values (they deal with them by value, not by reference).
The same is not true for arrays. The rules I mentioned above imply that you can't construct an expression whose value is of an array type. There are array values (consisting of the values of all the array's elements), but such values are difficult or impossible to manipulate directly.
The way C code usually manipulates arrays is by using pointers to individual elements.
It probably wouldn't have been too difficult to have designed C so that fixed-size arrays can be treated as values, with assignment, parameter passing, and so forth. But then you'd run into problems where int[10] and int[11] are two distinct and incompatible types. Most C code that deals with arrays needs to handle arrays whose size is determined at run time. For example, the string functions in <string.h> deal with arrays of characters of any arbitrary length. They do so by using pointers to the elements of the arrays. You couldn't very well have distinct functions for 1-element, 2-element, 3-element, and so forth, arrays.
You can do the equivalent of returning an array value from a function, but it's unfortunately awkward. You can return a structure containing the array -- but then the size of the array has to be fixed at compile time. You can return a pointer to (the first element of) the array -- but then you have to deal with allocating and deallocating memory to hold the array. You can have the caller pass in a pointer to an array -- but that places the burden of memory management on the caller. And so forth.
Yes, it's all a bit of a mess. But dealing with arrays that can vary in size is genuinely difficult. C gives you all the tools you need to do it, but leaves a lot of the detailed management to you, the programmer. (Other languages provide arrays as first-class types. Many of those languages have compilers or interpreters written in C.)
Suggested reading: Section 6 of the comp.lang.c FAQ.
The characteristic is that in the small and speedy C language you don't want the equivalent of large memcpy operations when returning. If you badly need arrays returned, make them a member of a struct, and voila, array return in C. Sort of, starting with C89 :-)
Or use a memcpy yourself when and where you need it.
While the array can't be returned from a C function, a pointer to the array may. For code example of how to do what you're looking for, visit the site:
http://www.tutorialspoint.com/cprogramming/c_return_arrays_from_function.htm
I had never thought about this before, but lately I've been worried about something. In Fortran90(95), say I create a really big array
Integer :: X(1000000)
and then I write a function that takes this array as an argument. When I pass the array to the function (as in myfunc(X)) what exactly happens during run time?
Does the entire array get passed by value and a new copy constructed inside the function?(costly)
Or does the compiler simply pass some sort of reference or pointer to the array?(cheap)
Do the dimension of the array or the declaration of the function make a difference?
In Fortran 90 , as in most other programming languages, arrays are passed by reference (technically, this is often a reference to the first item of the array). In Fortran 90, non-array values are also usually passed by reference. So, you needn't worry about the size of the parameters you pass, since they won't be copied but will, instead, be passed simply by reference.
The one thing you don't want to do is something like:
INTEGER :: X(1:1000,1:1000,1:1000)
CALL myRoutine(X(2:999,2:999,2:999))
where myRoutine cannot operate on the bounds of the array for some reason. It cannot pass the reference to the slice of the array since it not contiguous in memory. So it creates a temporary array and copies the values from X. Needless to say this is very slow. But you shouldn't have that issue with 1D array, even when specifying slices, as they are still contiguous in memory.