Allocating matrix performances

Allocating matrix performances - c

I have two scenarios, in both i allocate 78*2 sizeof(int) of memory and initialize it to 0.
Are there any differences regards performances?
Scenario A:
int ** v = calloc(2 , sizeof(int*));
for (i=0; i<2; ++i)
{
v[i] = calloc(78, sizeof(int));
}
Scenario B:
int ** v = calloc(78 , sizeof(int*));
for (i=0; i<78; ++i)
{
v[i] = calloc(2, sizeof(int));
}
I supposed that in performance terms, it's better to use a calloc if an initialize array is needed, let me know if I'm wrong

First, discussing optimization abstractly has some difficulties because compilers are becoming increasingly better at optimization. (For some reason, compiler developers will not stop improving them.) We do not always know what machine code given source code will produce, especially when we write source code today and expect it to be used for many years to come. Optimization may consolidate multiple steps into one or may omit unnecessary steps (such as clearing memory with calloc instead of malloc immediately before the memory is completely overwritten in a for loop). There is a growing difference between what source code nominally says (“Do these specific steps in this specific order”) and what it technically says in the language abstraction (“Compute the same results as this source code in some optimized fashion”).
However, we can generally figure that writing source code without unnecessary steps is at least as good as writing source code with unnecessary steps. With that in mind, let’s consider the nominal steps in your scenarios.
In Scenario A, we tell the computer:
Allocate 2 int *, clear them, and put their address in v.
Twice, allocate 78 int, clear them, and put their addresses in the preceding int *.
In Scenario B, we tell the computer:
Allocate 78 int *, clear them, and put their address in v.
78 times, allocate two int, clear them, and put their addresses in the preceding int *.
We can easily see two things:
Both of these scenarios both clear the memory for the int * and immediately fill it with other data. That is wasteful; there is no need to set memory to zero before setting it to something else. Just set it to something else. Use malloc for this, not calloc. malloc takes just one parameter for the size instead of two that are multiplied, so replace calloc(2, sizeof (int *)) with malloc(2 * sizeof (int *)). (Also, to tie the allocation to the pointer being assigned, use int **v = malloc(2 * sizeof *v); instead of repeating the type separately.)
At the step where Scenario B does 78 things, Scenario A does two things, but the code is otherwise very similar, so Scenario A has fewer steps. If both would serve some purpose, then A is likely preferable.
However, both scenarios allude to another issue. Presumably, the so-called array will be used later in the program, likely in a form like v[i][j]. Using this as a value means:
Fetch the pointer v.
Calculate i elements beyond that.
Fetch the pointer at that location.
Calculate j elements beyond that.
Fetch the int at that location.
Let’s consider a different way to define v: int (*v)[78] = malloc(2 * sizeof *v);.
This says:
Allocate space for 2 arrays of 78 int and put their address in v.
Immediately we see that involves fewer steps than Scenario A or Scenario B. But also look at what it does to the steps for using v[i][j] as a value. Because v is a pointer to an array instead of a pointer to a pointer, the computer can calculate where the appropriate element is instead of having to load an address from memory:
Fetch the pointer v.
Calculate i•78 elements beyond that.
Calculate j elements beyond that.
Fetch the int at that location.
So this pointer-to-array version is one step fewer than the pointer-to-pointer version.
Further, the pointer-to-pointer version requires an additional fetch from memory for each use of v[i][j]. Fetches from memory can be expensive relative to in-processor operations like multiplying and adding, so it is a good step to eliminate. Having to fetch a pointer can prevent a processor from predicting where the next load from memory might be based on recent patterns of use. Additionally, the pointer-to-array version puts all the elements of the 2×78 array together in memory, which can benefit the cache performance. Processors are also designed for efficient use of consecutive memory. With the pointer-to-pointer version, the separate allocations typically wind up with at least some separation between the rows and may have a lot of separation, which can break the benefits of consecutive memory use.

Related

how can allocate less memory and it still works? [duplicate]

Take the following code :
int *p = malloc(2 * sizeof *p);
p[0] = 10; //Using the two spaces I
p[1] = 20; //allocated with malloc before.
p[2] = 30; //Using another space that I didn't allocate for.
printf("%d", *(p+1)); //Correctly prints 20
printf("%d", *(p+2)); //Also, correctly prints 30
//although I didn't allocate space for it
With the line malloc(2 * sizeof *p) I am allocating space for two integers, right ? But if I add an int to the third position, I still gets allocated correctly and retrievable.
So my question is, why do you specify a size when you use malloc ?

Simple logic: If you do not park in a legal parking space, nothing might happen but occasionally your car might get towed and you might get stuck with a huge fine. And, sometimes, as you try to find your way to the pound where your car was towed, you might get run over by a truck.
malloc gives you as many legal parking spots as you asked. You can try to park elsewhere, it might seem to work, but sometimes it won't.
For questions such as this, the Memory Allocation section of the C FAQ is a useful reference to consult. See 7.3b.
On a related (humorous) note, see also a list of bloopers by ART.

C kindly let you shoot yourself in the head. You have just used random memory on the heap. With unforeseeable consequences.
Disclaimer: My last real C programing was done some 15 years ago.

Let me give you an analogy to why this "works".
Let's assume you need to draw a drawing, so you retrieve a piece of paper, lay it flat on your table, and start drawing.
Unfortunately, the paper isn't big enough, but you, not caring, or not noticing, just continue to draw your drawing.
When done, you take a step back, and look at your drawing, and it looks good, exactly as you meant it to be, and exactly the way you drew it.
Until someone comes along and picks up their piece of paper that they left on the table before you got to it.
Now there's a piece of the drawing missing. The piece you drew on that other person's paper.
Additionally, that person now has pieces of your drawing on his paper, probably messing with whatever he wanted to have on the paper instead.
So while your memory usage might appear to work, it only does so because your program finishes. Leave such a bug in a program that runs for a while and I can guarantee you that you get odd results, crashes and whatnot.
C is built like a chainsaw on steroids. There's almost nothing you cannot do. This also means that you need to know what you're doing, otherwise you'll saw right through the tree and into your foot before you know it.

You got (un)lucky. Accessing p[3] is undefined, since you haven't allocated that memory for yourself. Reading/writing off the end of an array is one of the ways that C programs can crash in mysterious ways.
For example, this might change some value in some other variable that was allocated via malloc. That means it might crash later, and it'll be very hard to find the piece of (unrelated) code that overwrote your data.
Worse yet, you might overwrite some other data and might not notice. Imagine this accidentally overwrites the amount of money you owe someone ;-)

In fact, malloc is not allocating enough space for your third integer, but you got "lucky" and your program didn't crash. You can only be sure that malloc has allocated exactly what you asked for, no more. In other words, your program wrote to a piece of memory that was not allocated to it.
So malloc needs to know the size of the memory that you need because it doesn't know what you will end up doing with the memory, how many objects you plan on writing to the memory, etc...

This all goes back to C letting you shoot yourself in the foot. Just because you can do this, doesn't mean you should. The value at p+3 is definitely not guaranteed to be what you put there unless you specifically allocated it using malloc.

Try this:
int main ( int argc, char *argv[] ) {
int *p = malloc(2 * sizeof *p);
int *q = malloc(sizeof *q);
*q = 100;
p[0] = 10; p[1] = 20; p[2] = 30; p[3] = 40;
p[4] = 50; p[5] = 60; p[6] = 70;
printf("%d\n", *q);
return 0;
}
On my machine, it prints:
50
This is because you overwrote the memory allocated for p, and stomped on q.
Note that malloc may not put p and q in contiguous memory because of alignment restrictions.

Memory is represented as an enumerable contiguous line of slots that numbers can be stored in. The malloc function uses some of these slots for its own tracking info, as well as sometimes returning slots larger than what you need, so that when you return them later it isn't stuck with an unusably small chunk of memory. Your third int is either landing on mallocs own data, on empty space leftover in the returned chunk, or in the area of pending memory that malloc has requested from the OS but not otherwise parcelled out to you yet.

Depending on the platform, p[500] would probably "work" too.

When using malloc(), you are accepting a contract with the runtime library in which you agree to ask for as much memory as you are planning to use, and it agrees to give it to you. It is the kind of all-verbal, handshake agreement between friends, that so often gets people in trouble. When you access an address outside the range of your allocation, you are violating your promise.
At that point, you have requested what the standard calls "Undefined Behavior" and the compiler and library are allowed to do anything at all in response. Even appearing to work "correctly" is allowed.
It is very unfortunate that it does so often work correctly, because this mistake can be difficult to write test cases to catch. The best approaches to testing for it involve either replacing malloc() with an implementation that keeps track of block size limits and aggressively tests the heap for its health at every opportunity, or to use a tool like valgrind to watch the behavior of the program from "outside" and discover the misuse of buffer memory. Ideally, such misuse would fail early and fail loudly.
One reason why using elements close to the original allocation often succeeds is that the allocator often gives out blocks that are related to convenient multiples of the alignment guarantee, and that often results in some "spare" bytes at the end of one allocation before the start of the next. However the allocator often store critical information that it needs to manage the heap itself near those bytes, so overstepping the allocation can result in destruction of the data that malloc() itself needs to successfully make a second allocation.
Edit: The OP fixed the side issue with *(p+2) confounded against p[1] so I've edited my answer to drop that point.

You are asking for space for two integers. p[3] assumes that you have space for 4 integers!
===================
You need to tell malloc how much you need because it can't guess how much memory you need.
malloc can do whatever it wants as long as it returns at least the amount of memory you ask for.
It's like asking for a seat in a restaurant. You might be given a bigger table than you need. Or you might be given a seat at a table with other people. Or you might be given a table with one seat. Malloc is free to do anything it wants as long as you get your single seat.
As part of the "contract" for the use of malloc, you are required to never reference memory beyond what you have asked for because you are only guaranteed to get the amount you asked for.

Because malloc() allocates in BYTES. So, if you want to allocate (for example) 2 integers you must specify the size in bytes of 2 integers. The size of an integer can be found by using sizeof(int) and so the size in bytes of 2 integers is 2 * sizeof(int). Put this all together and you get:
int * p = malloc(2 * sizeof(int));
Note: given that the above only allocates space for TWO integers you are being very naughty in assigning a 3rd. You're lucky it doesn't crash. :)

Because malloc is allocating space on the heap which is part of the memory used by your program which is dynamically allocated. The underlying OS then gives your program the requested amount (or not if you end up with some error which implies you always should check return of malloc for error condition ) of virtual memory which it maps to physical memory (ie. the chips) using some clever magic involving complex things like paging we don't want to delve into unless we are writing an OS.

When you use * (p+3), you're addressing out of bounds even with using 2*sizeof(* p), hence you're accessing an invalid memory block, perfect for seg faults.
You specify the size b/c otherwise, the function doesn't know how big of a block out of the heap memory to allocate to your program for that pointer.

The reason for the size given to malloc() is for the memory manager to keep track of how much space has been given out to each process on your system. These tables help the system to know who allocated how much space, and what addresses are free()able.
Second, c allows you to write to any part of ram at any time. Kernel's may prevent you from writing to certain sections, causing protection faults, but there is nothing preventing the programmer from attempting.
Third, in all likelyhood, malloc()ing the first time probably doesn't simply allocate 8 bytes to your process. This is implementation dependent, but it is more likely for the memory manager to allocate a full page for your use just because it is easier to allocate page size chunks....then subsequent malloc()'s would further divide the previously malloc()ed page.

As everyone has said, you're writing to memory that isn't actually allocated, meaning that something could happen to overwrite your data. To demonstrate the problem, you could try something like this:
int *p = malloc(2 * sizeof(int));
p[0] = 10; p[1] = 20; p[2] = 30;
int *q = malloc(2 * sizeof(int));
q[0] = 0; // This may or may not get written to p[2], overwriting your 30.
printf("%d", p[0]); // Correctly prints 10
printf("%d", p[1]); // Correctly prints 20
printf("%d", p[2]); // May print 30, or 0, or possibly something else entirely.
There's no way to guarantee your program will allocate space for q at p[2]. It may in fact choose a completely different location. But for a simple program like this, it seems likely, and if it does allocate q at the location where p[2] would be, it will clearly demonstrate the out-of-range error.

Do :
int *p = malloc(2 * sizeof(*p)); // wrong (if type is something greater than a machine word)
[type] *p = malloc(2 * sizeof([type])); // right.

What alignment issues limit the use of a block of memory created by malloc?

I am writing a library for various mathematical computations in C. Several of these need some "scratch" space -- memory that is used for intermediate calculations. The space required depends on the size of the inputs, so it cannot be statically allocated. The library will typically be used to perform many iterations of the same type of calculation with the same size inputs, so I'd prefer not to malloc and free inside the library for each call; it would be much more efficient to allocate a large enough block once, re-use it for all the calculations, then free it.
My intended strategy is to request a void pointer to a single block of memory, perhaps with an accompanying allocation function. Say, something like this:
void *allocateScratch(size_t rows, size_t columns);
void doCalculation(size_t rows, size_t columns, double *data, void *scratch);
The idea is that if the user intends to do several calculations of the same size, he may use the allocate function to grab a block that is large enough, then use that same block of memory to perform the calculation for each of the inputs. The allocate function is not strictly necessary, but it simplifies the interface and makes it easier to change the storage requirements in the future, without each user of the library needing to know exactly how much space is required.
In many cases, the block of memory I need is just a large array of type double, no problems there. But in some cases I need mixed data types -- say a block of doubles AND a block of integers. My code needs to be portable and should conform to the ANSI standard. I know that it is OK to cast a void pointer to any other pointer type, but I'm concerned about alignment issues if I try to use the same block for two types.
So, specific example. Say I need a block of 3 doubles and 5 ints. Can I implement my functions like this:
void *allocateScratch(...) {
return malloc(3 * sizeof(double) + 5 * sizeof(int));
}
void doCalculation(..., void *scratch) {
double *dblArray = scratch;
int *intArray = ((unsigned char*)scratch) + 3 * sizeof(double);
}
Is this legal? The alignment probably works out OK in this example, but what if I switch it around and take the int block first and the double block second, that will shift the alignment of the double's (assuming 64-bit doubles and 32-bit ints). Is there a better way to do this? Or a more standard approach I should consider?
My biggest goals are as follows:
I'd like to use a single block if possible so the user doesn't have to deal with multiple blocks or a changing number of blocks required.
I'd like the block to be a valid block obtained by malloc so the user can call free when finished. This means I don't want to do something like creating a small struct that has pointers to each block and then allocating each block separately, which would require a special destroy function; I'm willing to do that if that's the "only" way.
The algorithms and memory requirements may change, so I'm trying to use the allocate function so that future versions can get different amounts of memory for potentially different types of data without breaking backward compatibility.
Maybe this issue is addressed in the C standard, but I haven't been able to find it.

The memory of a single malloc can be partitioned for use in multiple arrays as shown below.
Suppose we want arrays of types A, B, and C with NA, NB, and NC elements. We do this:
size_t Offset = 0;
ptrdiff_t OffsetA = Offset; // Put array at current offset.
Offset += NA * sizeof(A); // Move offset to end of array.
Offset = RoundUp(Offset, sizeof(B)); // Align sufficiently for type.
ptrdiff_t OffsetB = Offset; // Put array at current offset.
Offset += NB * sizeof(B); // Move offset to end of array.
Offset = RoundUp(Offset, sizeof(C)); // Align sufficiently for type.
ptrdiff_t OffsetC = Offset; // Put array at current offset.
Offset += NC * sizeof(C); // Move offset to end of array.
unsigned char *Memory = malloc(Offset); // Allocate memory.
// Set pointers for arrays.
A *pA = Memory + OffsetA;
B *pB = Memory + OffsetB;
C *pC = Memory + OffsetC;
where RoundUp is:
// Return Offset rounded up to a multiple of Size.
size_t RoundUp(size_t Offset, size_t Size)
{
size_t x = Offset + Size - 1;
return x - x % Size;
}
This uses the fact, as noted by R.., that the size of a type must be a multiple of the alignment requirement for that type. In C 2011, sizeof in the RoundUp calls can be changed to _Alignof, and this may save a small amount of space when the alignment requirement of a type is less than its size.

If the user is calling your library's allocation function, then they should call your library's freeing function. This is very typical (and good) interface design.
So I would say just go with the struct of pointers to different pools for your different types. That's clean, simple, and portable, and anybody who reads your code will see exactly what you are up to.
If you do not mind wasting memory and insist on a single block, you could create a union with all of your types and then allocate an array of those...
Trying to find appropriately aligned memory in a massive block is just a mess. I am not even sure you can do it portably. What's the plan? Cast pointers to intptr_t, do some rounding, then cast back to a pointer?

The latest C11 standard has the max_align_t type (and _Alignas specifier and _Alignof operator and <stdalign.h> header).
GCC compiler has a __BIGGEST_ALIGNMENT__ macro (giving the maximal size alignment). It also proves some extensions related to alignment.
Often, using 2*sizeof(void*) (as the biggest relevant alignment) is in practice quite safe (at least on most of the systems I heard about these days; but one could imagine weird processors and systems where it is not the case, perhaps some DSP-s). To be sure, study the details of the ABI and calling conventions of your particular implementation, e.g. x86-64 ABI and x86 calling conventions...
And the system malloc is guaranteed to return a sufficiently aligned pointer (for all purposes).
On some systems and targets and some processors giving a larger alignment might give performance benefit (notably when asking the compiler to optimize). You may have to (or want to) tell the compiler about that, e.g. on GCC using variable attributes...
Don't forget that according to Fulton
there is no such thing as portable software, only software that has been ported.
but intptr_t and max_align_t is here to help you....

Note that the required alignment for any type must evenly divide the size of the type; this is a consequence of the representation of array types. Thus, in the absence of C11 features to determine the required alignment for a type, you can just estimate conservatively and use the type's size. In other words, if you want to carve up part of an allocation from malloc for use storing doubles, make sure it starts at an offset that's a multiple of sizeof(double).

Is there a way to tell C to never allocate memory dynamically?

I want to write a C program whose memory will be constant. It can never allocate more memory than certain amount.
int main(){
int memory[512];
int i = 5;
i = i + 5;
memory[50] = i;
};
Notice that on this example i = 5 and i = i+5 will allocate memory. I want to completely avoid the internal memory allocation procedure (which I believe is kind of slow).´Is there a way to tell C to allocate it directly on my memory array?

int memory[512]; int i = 5;
By doing this you have already allocated memory. Even if you do not fill elements after 100, there will be still total allocations of 512 ints for variable memory and 1 int for variable i.
If you need dynamic allowcation you can check malloc() or calloc().

You should change "which I believe is kind of slow" to "I am certain that it will not affect program speed"

First, the "allocation" for local ("stack") variables is very very fast, and generally happens at compile-time when the function's stack frame is laid out. There is no overhead per variable at run-time, that's just not how it works.
Second, if you want to avoid having a variable such as int i; take up further stack space, then you must avoid having that declaration in the code.
You can use the "block" you allocated but it's going to be painful:
int space[512];
space[0] = 5;
space[0] += 5;
space[50] = space[0];
The above replicates your use of i but instead manually "allocates" space[0] in the array. Of course, the above very likely generates worse code than just having a plain i, all for the sake of saving one int's worth of stack space.

Use globally declared variables only.
This however is bad advise for other obvious reasons: https://stackoverflow.com/a/176166/694576

The likely allocation cost of int i = 5; and i = i + 5; is zero. Single int variables inside methods just don't have any appreciable allocation cost. The compiler has several options here:
It will strive where possible to do the arithmetic at compile time and eliminate the variable entirely (memory[50] = 10;), meaning that at run time both the allocation and the arithmetic have no cost.
It will strive where possible to put the variable in a CPU register (zero allocation cost but still takes time for the arithmetic).
Failing that, it will reserve space for the variable on the stack, which consists of decrementing the stack pointer by 513*sizeof(int) bytes instead of 512*sizeof(int) bytes. It still won't take extra time. The OS might be obliged to fetch an extra page of memory for the thread's stack but this is more-or-less a one-time cost.

Malloc or normal array definition?

When shall i use malloc instead of normal array definition in C?
I can't understand the difference between:
int a[3]={1,2,3}
int array[sizeof(a)/sizeof(int)]
and:
array=(int *)malloc(sizeof(int)*sizeof(a));

In general, use malloc() when:
the array is too large to be placed on the stack
the lifetime of the array must outlive the scope where it is created
Otherwise, use a stack allocated array.

int a[3]={1,2,3}
int array[sizeof(a)/sizeof(int)]
If used as local variables, both a and array would be allocated on the stack. Stack allocation has its pros and cons:
pro: it is very fast - it only takes one register subtraction operation to create stack space and one register addition operation to reclaim it back
con: stack size is usually limited (and also fixed at link time on Windows)
In both cases the number of elements in each arrays is a compile-time constant: 3 is obviously a constant while sizeof(a)/sizeof(int) can be computed at compile time since both the size of a and the size of int are known at the time when array is declared.
When the number of elements is known only at run-time or when the size of the array is too large to safely fit into the stack space, then heap allocation is used:
array=(int *)malloc(sizeof(int)*sizeof(a));
As already pointed out, this should be malloc(sizeof(a)) since the size of a is already the number of bytes it takes and not the number of elements and thus additional multiplication by sizeof(int) is not necessary.
Heap allocaiton and deallocation is relatively expensive operation (compared to stack allocation) and this should be carefully weighted against the benefits it provides, e.g. in code that gets called multitude of times in tight loops.
Modern C compilers support the C99 version of the C standard that introduces the so-called variable-length arrays (or VLAs) which resemble similar features available in other languages. VLA's size is specified at run-time, like in this case:
void func(int n)
{
int array[n];
...
}
array is still allocated on the stack as if memory for the array has been allocated by a call to alloca(3).

You definately have to use malloc() if you don't want your array to have a fixed size. Depending on what you are trying to do, you might not know in advance how much memory you are going to need for a given task or you might need to dynamically resize your array at runtime, for example you might enlarge it if there is more data coming in. The latter can be done using realloc() without data loss.
Instead of initializing an array as in your original post you should just initialize a pointer to integer like.
int* array; // this variable will just contain the addresse of an integer sized block in memory
int length = 5; // how long do you want your array to be;
array = malloc(sizeof(int) * length); // this allocates the memory needed for your array and sets the pointer created above to first block of that region;
int newLength = 10;
array = realloc(array, sizeof(int) * newLength); // increase the size of the array while leaving its contents intact;

Your code is very strange.
The answer to the question in the title is probably something like "use automatically allocated arrays when you need quite small amounts of data that is short-lived, heap allocations using malloc() for anything else". But it's hard to pin down an exact answer, it depends a lot on the situation.
Not sure why you are showing first an array, then another array that tries to compute its length from the first one, and finally a malloc() call which tries do to the same.
Normally you have an idea of the number of desired elements, rather than an existing array whose size you want to mimic.
The second line is better as:
int array[sizeof a / sizeof *a];
No need to repeat a dependency on the type of a, the above will define array as an array of int with the same number of elements as the array a. Note that this only works if a is indeed an array.
Also, the third line should probably be:
array = malloc(sizeof a);
No need to get too clever (especially since you got it wrong) about the sizeof argument, and no need to cast malloc()'s return value.

A suitable replacement for *****double

I have this big data structure which is a list of lists of lists of lists of lists of doubles. Clearly it's extremely inefficient to handle. Around 70% of time spent to run my application is used to write zeros in the doubles at the end of the lists. I need a faster replacement which satisfies two constraints:
1)All the memory must be allocated continuously (that is, a huge chunk of memory)
2)I must access this chunk using the usual A[][][][][] syntax
As for now, I thought of using a *double to hold the entire chunk and reuse my list of lists of... to store pointers to the appropriate areas in the chunk.
Any better ideas?

One example of how to achieve this with a 2D array, I'm to lazy to do the 5D case, is
double **a;
a = malloc (n * sizeof(*double));
a[0] = malloc (n * m * sizeof(double));
for (int i = 1; i < n; ++i)
a[i] = a[0][i*n];
This way you can decide if you wish to index it with a[0][i*n], or a[i][j]. The memory is contiguous, and you get away with only two allocations. Of course, this also requires a free n*m*sizeof(double) block in memory, but since you demand the memory to be allocated continuously I expect this to be satisfied. This also means that you will have to delete it correctly with:
free(a[0]);
free(a);
so I would make a create5Darray (n,m,k,l,t) and a delete5Darray function to make this easier.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight