When to use array and when to use cell array? - arrays

In Matlab, I was trying to put anonymous functions in an array:
>> a=[#(k)0.1/(k+1) #(k)0.1/(k+1)^0.501]
??? Error using ==> horzcat
Nonscalar arrays of function handles are not allowed; use cell arrays
instead.
So I wonder what kinds of elements are allowed in an array, and in a cell array?
For example, I know that in an array, the elements can be numerical or strings. What else?

In short: Cell array is a heterogeneous container, regular array is homogeneous. This means that in a regular array all of the elements are of the same type, whereas in cell array, they can be different. You can read more about cell array here.
Use cell array when:
You have different types in your array
You are not sure whether in the future you might extend it to another types
You are working with objects that have an inheritance pattern
You are working with an array of strings - almost in any occasion it is preferable to char(n,m)
You have a large array, and you often update a single element in a function - Due to Matlabs copy-on-write policy
You are working with function handles (as #Pursuit explained)
Prefer regular array when:
All of the elements have the same type
You are updating the whole array in one shot - like mathematical operations.
You want to have type safety
You will not change the data type of the array in the future
You are working with mathematical matrices.
You are working with objects that have no inheritance
More explanation about copy-on-write:
When you pass an array to a function, a pointer/reference is passed.
function foo(x)
disp(x);
end
x= [1 2 3 4 5];
foo(x); %No copy is done here! A pointer is passed.
But when you change it (or a part of it), a copy is created.
function foo(x)
x(4) = x(4) + 1;
end
x= [1 2 3 4 5];
foo(x); %x is being copied! At least twice memory amount is needed.
In a cell array, only the cell is copied.
function foo(x)
x{4} = x{4} + 1;
end
x= {1 2 3 4 5}; %Only x{4} will be copied
Thus, if you call a function that changes a single element on a large array, you are making a lot of copies - that makes it slower. But in a cell array, it is not the case.

Function handles are actually the exception here, and the reason is that the Matlab syntax becomes surprising if you allow function handles to be a part of non-cell array. For example
a = #(x)x+1;
a(2); %This returns 2
But, if arrays of function handles were supported, then
b = [#(x)x+1, #(x)x+2];
b(2); %This would return #(x)x+2
b(3) = #(x)x+3; %This would extend the size of the array
So then would this be allowed?
a(2) = #(x)x+2; %Would this extend the size of the previously scalar array
Longwinded edit: This is documented in the release notes accompanying release R14, which was the first release allowing anonymous functions. Prior to R14 you could create function handles as references to m-file functions, and they could be placed in non-cell arrays. These could only be called using feval (e.g.: fnSin = #sin; output = feval(fnSin, pi)).
When anonymous functions were introduced, the Mathworks updated the syntax to allow a simpler calling convention (e.g. fnSin = #sin; output = fnSin(pi)) which had the effect of causing an ambiguity when using non-cell array of function handles. It looks like they did their best to grandfather this new behavior in, but those grandfathered conditions have certainly expired (this was 2004).

The arrays can store only data with a fixed length. For instance, double, single, char, logical, integer.
The reason is that (I guess) they are stored directly in a block of memory. On the other hand cells are stored as a list of pointers, each pointer can point to a data of different size.
That's why arrays cannot store strings, function handle, arrays, and multiple data types.
Those type can have different length. For instance 'bla' has 3 bytes, 'blabla' has 6 bytes. Therefore if they are stored in the same memory block, if you want to change 'bla' into 'blabla' you would have to shift all the rest of the memory, which would be very slow, and so it's not handled.

Related

Overwriting an existing 2D Array in C

I'm currently writing a project in C, and I need to be able to fill a 2D array with information already stored in another 2D array. In a separate C file, I have this array:
int levelOne[][4] =
{{5,88,128,0},
{153,65,0,0},
{0,144,160,20}}; //First Array
int levelTwo[][4] =
{{5,88,128,0},
{153,65,0,0},
{0,144,160,20}}; //Second Array
And in my main file, I have this variable which I'd like to fill with the information from both of these arrays at different points in my code. (This isn't exactly what I'm doing, but it's the general gist):
#include "arrayFile.c"
void main()
{
int arrayContainer[][4] = levelOne;
while (true)
{
func(arrayContainer);
if(foo)
{
arrayContainer = levelTwo;//Switches to the other array if the conditional is met.
}
}
}
I know this method doesn't work - you can't overwrite items in arrays after they're instantiated. But is there any way to do something like this? I know I'll most likely need to use pointers to do this instead of completely overwriting the array, however there's not a lot of information on the internet about pointers with multidimensional arrays. In this situation, what's best practice?
Also, I don't know exactly how many arrays of 4 there will be, so I wouldn't be able to use a standard 3D array and just switch between indexes, unless there's a way to make a 3D jagged array that I don't know about.
Given the definitions you show, such as they are, all you need is memcpy(arrayContainer, levelTwo, sizeof LevelTwo);.
You should ensure that arrayContainer has sufficient memory to contain the copied data and that LevelTwo, since it is used as the operand of sizeof, is a designator for the actual array, not a pointer. If it is not, replace sizeof LevelTwo with the size of the array.
If you do not need the actual memory filled with data but simply need a way to refer to the contents of the different arrays, make arrayContainer a pointer instead of an array, as with int (*arrayContainer)[4];. Then you can use arrayContainer = levelOne; or arrayContainer = levelTwo; to change which data it points to.
Also, I don't know exactly how many arrays of 4 there will be, so I wouldn't be able to use a standard 3D array and just switch between indexes, unless there's a way to make a 3D jagged array that I don't know about.
It is entirely possible to have a pointer to dynamically allocated memory which is filled with pointers to arrays of four int, and those pointers can be changed at will.

Can Ada functions return arrays?

I read somewhere that Ada allows a function only to return a single item. Since an array can hold multiple items does this mean that I can return the array as a whole or must I return only a single index of the array?
Yes, an Ada function can return an array - or a record.
There can be a knack to using it, though. For example, if you are assigning the return value to a variable, the variable must be exactly the right size to hold the array, and there are two common ways of achieving that.
1) Fixed size array - cleanest way is to define an array type, e.g.
type Vector is new Array(1..3) of Integer;
function Unit_Vector return Vector;
A : Vector;
begin
A := Unit_Vector;
...
2) Unconstrained array variables.
These are arrays whose size is determined at runtime by the initial assignment to them. Subsequent assignments to them will fail unless the new value happens to have the same size as the old. The trick is to use a declare block - a new scope - so that each assignment to the unconstrained variable is its first assignment. For example:
for i in 1 .. last_file loop
declare
text : String := Read_File(Filename(i));
-- the size of "text" is determined by the file contents
begin
-- process the text here.
for j in text'range loop
if text(j) = '*' then
...
end loop;
end
end loop;
One warning : if the array size is tens of megabytes or more, it may not be successfully allocated on the stack. So if this construct raises Storage_Error exceptions, and you can't raise the stack size, you may need to use access types, heap allocation via "new" and deallocation as required.
Yes, an Ada function can return an array. For example, an Ada String is "A one-dimensional array type whose component type is a character type." Several of the functions defined in Ada.Strings.Fixed—including Insert, Delete, Head, Tail and Trim—return a String.

Compatibility of the tricky dynamic arrays with dynamic arrays

Considering the old trick to make an array
Type
IntArray = Array Of Integer;
PIntArray = ^IntArray
PTDynIntArray = ^TDynIntArray;
TDynIntArray = Array[0..0] Of Integer;
{later...}
GetMem(APTDynIntArray,100*SizeOf(Integer));
APTDynIntArray^[49] := 50
Is There a way to make this tricky array compatible with a standard dynamic array ?
For example, If I want to translate an old (lets say from 1999) unit with
Procedure DoSomething(Data: PTDynIntArray);
And considering that the data will be processed using the above syntax (dataname-dereference-index in brackets), Delphi compiler will not stop if I pass a PIntArray as argument, however I get an AV at run-time (I guess that Delphi considers, in this case, that PIntArray Is the same as PTDynIntArray)
So can these two types (PIntArray and PTDynIntArray) be combined, type casted, inter-changed ? How ?
You can convert an IntArray (note: not PIntArray) to a PTDynIntArray. The reverse is not generally possible.
An IntArray is stored as a pointer to the first element of the array. The array is preceded by information about the array's length and such, but if your procedure only accesses the array elements, they won't do any harm.
You may, to be explicit, also write it as #IntArray[0].

How to check if "set" in c

If I allocate a C array like this:
int array[ 5 ];
Then, set only one object:
array[ 0 ] = 7;
How can I check whether all the other keys ( array[1], array[2], …) are storing a value? (In this case, of course, they aren't.)
Is there a function like PHP's isset()?
if ( isset(array[ 1 ]) ) ...
There isn't things like this in C. A static array's content is always "set". You could, however, fill in some special value to pretend it is uninitialized, e.g.
// make sure this value isn't really used.
#define UNINITIALIZED 0xcdcdcdcd
int array[5] = {UNINITIALIZED, UNINITIALIZED, UNINITIALIZED, UNINITIALIZED, UNINITIALIZED};
array[0] = 7;
if (array[1] != UNINITIALIZED) {
...
You can't
There values are all undefined (thus random).
You could explicitly zero out all values to start with so you at least have a good starting point. But using magic numbers to detect if an object has been initialized is considered bad practice (but initializing variables is considered good practice).
int array[ 5 ] = {};
But if you want to explicitly check if they have been explicitly set (without using magic numbers) since creation you need to store that information in another structure.
int array[ 5 ] = {}; // Init all to 0
int isSet[ 5 ] = {}; // Init all to 0 (false)
int getVal(int index) {return array[index];}
int isSet(int index) {return isSet[index];}
void setVal(int index,int val) {array[index] = val; isSet[index] = 1; }
In C, all the elements will have values (garbage) at the time of allocation. So you cannot really have a function like what you are asking for.
However, you can by default fill it up with some standard values like 0 or INT_MIN using memset() and then write an isset() code.
I don't know php, but one of two things is going on here
the php array is actually a hash-map (awk does that)
the php array is being filled with nullable types
in either case there is a meaningful concept of "not set" for the values of the array. On the other hand a c array of built in type has some value in every cell at all times. If the array is uninitialized and is automatic or was allocated on the heap those values may be random, but they exist.
To get the php behavior:
Implement (or find a library wit) and use a hashmap instead on an array.
Make it an array of structures which include an isNull field.
Initialize the array to some sentinal value in all cells.
One solution perhaps is to use a separate array of flags. When you assign one of the elements, set the flag in the boolean array.
You can also use pointers. You can use null pointers to represent data which has not been assigned yet. I made an example below:
int * p_array[3] = {NULL,NULL,NULL};
p_array[0] = malloc(sizeof(int));
*p_array[0] = (int)0;
p_array[2] = malloc(sizeof(int));
*p_array[2] = (int)4;
for (int x = 0; x < 3; x++) {
if (p_array[x] != NULL) {
printf("Element at %i is assigned and the value is %i\n",x,*p_array[x]);
}else{
printf("Element at %i is not assigned.\n",x);
}
}
You could make a function which allocates the memory and sets the data and another function which works like the isset function in PHP by testing for NULL for you.
I hope that helps you.
Edit: Make sure the memory is deallocated once you have finished. Another function could be used to deallocate certain elements or the entire array.
I've used NULL pointers before to signify data has not been created yet or needs to be recreated.
An approach I like is to make 2 arrays, one a bit-array flagging which indices of the array are set, and the other containing the actual values. Even in cases where you don't need to know whether an item in the array is "set" or not, it can be a useful optimization. Zeroing a 1-bit-per-element bit array is a lot faster than initializing an 8-byte-per-element array of size_t, especially if the array will remain sparse (mostly unfilled) for its entire lifetime.
One practical example where I used this trick is in a substring search function, using a Boyer-Moore-style bad-character skip table. The table requires 256 entries of type size_t, but only the ones corresponding to characters which actually appear in the needle string need to be filled. A 1kb (or 2kb on 64-bit) memset would dominate cpu usage in the case of very short searches, leading other implementations to throw around heuristics for whether or not to use the table. But instead, I let the skip table go uninitialized, and used a 256-bit bit array (only 32 bytes to feed to memset) to flag which entries are in use.

What is the reason C compiler demands that number of columns in a 2d array will be defined?

given the following function signature:
void readFileData(FILE* fp, double inputMatrix[][], int parameters[])
this doesn't compile.
and the corrected one:
void readFileData(FILE* fp, double inputMatrix[][NUM], int parameters[])
my question is, why does the compiler demands that number of columns will be defined when handling a 2D array in C? Is there a way to pass a 2D array to a function with an unknown dimensions?
thank you
Built-in multi-deminsional arrays in C (and in C++) are implemented using the "index-translation" approach. That means that 2D (3D, 4D etc.) array is laid out in memory as an ordinary 1D array of sufficient size, and the access to the elements of such array is implemented through recalculating the multi-dimensional indices onto a corresponding 1D index. For example, if you define a 2D array of size M x N
double inputMatrix[M][N]
in reality, under the hood the compiler creates an array of size M * N
double inputMatrix_[M * N];
Every time you access the element of your array
inputMatrix[i][j]
the compiler translates it into
inputMatrix_[i * N + j]
As you can see, in order to perform the translation the compiler has to know N, but doesn't really need to know M. This translation formula can easily be generalized for arrays with any number of dimensions. It will involve all sizes of the multi-dimensional array except the first one. This is why every time you declare an array, you are required to specify all sizes except the first one.
As the array in C is purely memory without any meta information about dimensions, the compiler need to know how to apply the row and column index when addressing an element of your matrix.
inputMatrix[i][j] is internally translated to something equivalent to *(inputMatrix + i * NUM + j)
and here you see that NUM is needed.
C doesn't have any specific support for multidimensional arrays. A two-dimensional array such as double inputMatrix[N][M] is just an array of length N whose elements are arrays of length M of doubles.
There are circumstances where you can leave off the number of elements in an array type. This results in an incomplete type — a type whose storage requirements are not known. So you can declare double vector[], which is an array of unspecified size of doubles. However, you can't put objects of incomplete types in an array, because the compiler needs to know the element size when you access elements.
For example, you can write double inputMatrix[][M], which declares an array of unspecified length whose elements are arrays of length M of doubles. The compiler then knows that the address of inputMatrix[i] is i*sizeof(double[M]) bytes beyond the address of inputMatrix[0] (and therefore the address of inputMatrix[i][j] is i*sizeof(double[M])+j*sizeof(double) bytes). Note that it needs to know the value of M; this is why you can't leave off M in the declaration of inputMatrix.
A theoretical consequence of how arrays are laid out is that inputMatrix[i][j] denotes the same address as inputMatrix + M * i + j.¹
A practical consequence of this layout is that for efficient code, you should arrange your arrays so that the dimension that varies most often comes last. For example, if you have a pair of nested loops, you will make better use of the cache with for (i=0; i<N; i++) for (j=0; j<M; j++) ... than with loops nested the other way round. If you need to switch between row access and column access mid-program, it can be beneficial to transpose the matrix (which is better done block by block rather than in columns or in lines).
C89 references: §3.5.4.2 (array types), §3.3.2.1 (array subscript expressions)
C99 references: §6.7.5.2 (array types), §6.5.2.1-3 (array subscript expressions).
¹ Proving that this expression is well-defined is left as an exercise for the reader. Whether inputMatrix[0][M] is a valid way of accessing inputMatrix[1][0] is not so clear, though it would be extremely hard for an implementation to make a difference.
This is because in memory, this is just a contiguous area, a single-dimension array if you will. And to get the real offset of inputMatrix[x][y] the compiler has to calculate (x * elementsPerColumn) + y. So it needs to know elementsPerColumn and that in turn means you need to tell it.
No, there's not. The situation's pretty simple really: what the function receives is really just a single, linear block of memory. Telling it the number of columns tells it how to translate something like block[x][y] into a linear address in the block (i.e., it needs to do something like address = row * column_count + column).
Other people have explained why, but the way to pass a 2D array with unknown dimensions is to pass a pointer. The compiler demotes array parameters to pointers anyway. Just make sure it's clear what you expect in your API docs.

Resources