I have a problem to understand the memory usage of the following Code:
typedef struct list{
uint64_t*** entrys;
int dimension;
uint64_t len;
} list;
void init_list(list * t, uint64_t dim, uint64_t length, int amount_reg)
{
t->dimension = dim;
t->len=length;
t->entrys = (uint64_t ***) malloc(sizeof(uint64_t**)*length);
uint64_t i;
for(i=0;i<length;i++)
{
t->entrys[i] = (uint64_t **) malloc(sizeof(uint64_t *)*dim);
int j;
for(j=0;j<dim;j++)
{
t->entrys[i][j]=(uint64_t *) malloc(sizeof(uint64_t)*amount_reg);
}
}
}
int main()
{
list * table = (list *) malloc(sizeof(list));
init_list(table,3,2048*2048,2);
_getch();
}
What i want to do is allocating a 3d-Array of uint64_t elements like table[4194304][3][2].
The taskmanager shows a memory usage of 560MB. cO
If i try to calculate the memory usage on my own i can't comprehend that value.
Here is my calculation (for a x64 System):
2^20 * 8 Byte (first dimension pointers)
+ 2^20 * 3 * 8 Byte (second dimension pointers)
+ 2^20 * 3 * 2 * 8 Byte (for the values itsself)
= 2^20 * 8 Byte * 10 = 80MB
Maybe I'm totaly wrong with that calculation or my code generates a huge amount of overhead?!
If so, is there a way, to make this program more memory efficent?
I can't imagine that for something like ~2^23 uint64_t values so much memory is needed (cause 2^23*8Byte are just 64MB)
Your code does 2²² · 4 + 1 = 16777217 calls to malloc(). For each allocated memory region, malloc() does a little bookkeeping. This adds up when you do that many calls to malloc(). You can reduce the overhead by calling malloc() fewer times like this:
void init_list(list * t, int dim, uint64_t length, int amount_reg)
{
uint64_t ***entries = malloc(sizeof *entries * length);
uint64_t **seconds = malloc(sizeof *seconds * length * dim);
uint64_t *thirds = malloc(sizeof *thirds * length * dim * amount_reg);
uint64_t i, j;
t->entrys = entries;
for (i = 0; i < length; i++) {
t->entrys[i] = seconds + dim * i;
for (j = 0; j < dim; j++)
t->entrys[i][j] = thirds + amount_reg * j + amount_reg * dim * i;
}
}
Here we call malloc() only three times, and memory usage goes down from 561272 KiB to 332020 KiB. Why is the memory usage still so high? Because you made a mistake in your computations. The allocations allocate this much memory:
entries: sizeof(uint64_t**) * length = 8 · 2²²
seconds: sizeof(uint64_t*) * length * dim = 8 · 2²² · 3
thirds: sizeof(uint64_t) * length * dim * amount_reg = 8 · 2²² · 3 · 2
All together we have (1 + 3 + 6) · 8 · 2²² = 335544320 bytes (327680 KiB or 320 MiB) of RAM which closely matches the amount of memory observed.
How can you reduce this amount further? Consider transposing your array so the axes are sorted in ascending order of size. This way you waste much less memory in pointers. You could also consider allocating space for the values only and doing index computations manually. This can speed up the code a lot (less memory accesses) and saves memory but is tedious to program.
4194304 is not 2^20, its more like 2^22, so your calculation is off by at least a factor of 4. And you also allocate a set of pointers to point to other data, which takes space. In your code, the first malloc allocates
2048*2048 pointers, not a single pointer to that many items.
You should also use best practice for dynamic allocation:
1) Do not cast the malloc return
2) always use expression = malloc(count * sizeof *expression); This way you can never get the sizes wrong, no matter how many pointer levels you use in the expression. E.g.
t->entrys = malloc(length * sizeof *t->entrys);
t->entrys[i] = malloc(dim * sizeof *t->entrys[i]);
t->entrys[i][j] = malloc(amount_reg * sizeof *t->entrys[i][j]);
Related
This is part of my implementation of kmean algorithm. I have two blocks of memory both in equal size such that *cluster_centeris the current center of cluster and *new_centroids represents the new centroid after taking the mean of the cluster's points:
double *cluster_center = malloc((k * dim) * sizeof(double));
double *new_centroids = malloc((k * dim) * sizeof(double));
I have the following loop to copy the results from the new_centroids to the cluster_center with no issues:
for (int i = 0; i < k; ++i) {
memcpy(&cluster_center[i * dim], &new_centroids[i * dim], dim * sizeof(double));
}
In fact, I want to know if C has a built-it function to compare the values of both blocks since I want to terminate my algorithm once the values of *new_centroids and *cluster_center are the same (i.e., didn't change). I really don't know how to do that.
Thank you
The function you're looking for is memcmp (memory compare). Immediately after you execute a statement:
memcpy(destination, source, size);
then
memcmp(destination, source, size);
should return zero.
This question already has answers here:
Faster approach to checking for an all-zero buffer in C?
(19 answers)
Closed 7 years ago.
I have a mass of data, maybe 4MB. Now want to check if all bits in it are 0.
Eg:
Here is the data:
void* data = malloc(4*1024*1024);
memset(data, 0, 4*1024*1024);
Check if all bits in it are 0. Here is my solution which is not fast enough:
int dataisnull(char* data, int length)
{
int i = 0;
while(i<length){
if (data[i]) return 0;
i++;
}
return 1;
}
This code might have some things to improve in performance. For example, in 32/64 bits machine, checking 4/8 bytes at a time may be faster.
So I wonder what is the fastest way to do it?
You can handle multiple bytes at a time and unroll the loop:
int dataisnull(const void *data, size_t length) {
/* assuming data was returned by malloc, thus is properly aligned */
size_t i = 0, n = length / sizeof(size_t);
const size_t *pw = data;
const unsigned char *pb = data;
size_t val;
#define UNROLL_FACTOR 8
#if UNROLL_FACTOR == 8
size_t n1 = n - n % UNROLL_FACTOR;
for (; i < n1; i += UNROLL_FACTOR) {
val = pw[i + 0] | pw[i + 1] | pw[i + 2] | pw[i + 3] |
pw[i + 4] | pw[i + 5] | pw[i + 6] | pw[i + 7];
if (val)
return 0;
}
#endif
val = 0;
for (; i < n; i++) {
val |= pw[i];
}
for (i = n * sizeof(size_t); i < length; i++) {
val |= pb[i];
}
return val == 0;
}
Depending on your specific problem, it might be more efficient to detect non zero values early or late:
If the all zero case is the most common, you should compute cumulate all bits into the val accumulator and test only at the end.
If the all zero case is rare, you should check for non zero values more often.
The unrolled version above is a compromise that tests for non zero values every 64 or 128 bytes depending on the size of size_t.
Depending on your compiler and processor, you might get better performance by unrolling less or more. You could also use intrinsic functions available for your particular architecture to take advantage of vector types, but it would be less portable.
Note that the code does not verify proper alignment for the data pointer:
it cannot be done portably.
it assumes the data was allocated via malloc or similar, hence properly aligned for any type.
As always, benchmark different solutions to see if it makes a real difference. This function might not be a bottleneck at all, writing a complex function to optimize a rare case is counterproductive, it makes the code less readable, more likely to contain bugs and much less maintainable. For example, the assumption on data alignment may not hold if you change memory allocation scheme or if you use static arrays, the function may invoke undefined behavior then.
The following checks if the first byte is what you want, and all subsequent pairs of bytes are the same.
int check_bytes(const char * const data, size_t length, const char val)
{
if(length == 0) return 1;
if(*data != val) return 0;
return memcmp(data, data+1, length-1) ? 0 : 1;
}
int check_bytes64(const char * const data, size_t length, const char val)
{
const char * const aligned64_start = (char *)((((uintptr_t)data) + 63) / 64 * 64);
const char * const aligned64_end = (char *)((((uintptr_t)data) + length) / 64 * 64);
const size_t start_length = aligned64_start - data;
const size_t aligned64_length = aligned64_end - aligned64_start;
const size_t end_length = length - start_length - aligned64_length;
if (!check_bytes(data, start_length, val)) return 0;
if (!check_bytes(aligned64_end, end_length, val)) return 0;
return memcmp(aligned64_start, aligned64_start + 64, aligned64_length-64) ? 0 : 1;
}
A more elaborate version of this function should probably pass cache-line-aligned pointers to memcmp, and manually check the remaining blocks(s) instead of just the first byte.
Of course, you will have to profile on your specific hardware to see if there is any speed benefit of this method vs others.
If anyone doubts whether this works, ideone.
I once wrote the following function for my own use. It assumes that the data to check is a multiple of a constant chunk size and aligned properly for a buffer of machine words. If this is not given in your case, it is not hard to loop for the first and last few bytes individually and only check the bulk with the optimized function. (Strictly speaking, it is undefined behavior even if the array is properly aligned but the data has been written by any type that is incompatible with unsigned long. However, I believe that you can get pretty far with this careful breaking of the rules here.)
#include <assert.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
bool
is_all_zero_bulk(const void *const p, const size_t n)
{
typedef unsigned long word_type;
const size_t word_size = sizeof(word_type);
const size_t chunksize = 8;
assert(n % (chunksize * word_size) == 0);
assert((((uintptr_t) p) & 0x0f) == 0);
const word_type *const frst = (word_type *) p;
const word_type *const last = frst + n / word_size;
for (const word_type * iter = frst; iter != last; iter += chunksize)
{
word_type acc = 0;
// Trust the compiler to unroll this loop at its own discretion.
for (size_t j = 0; j < chunksize; ++j)
acc |= iter[j];
if (acc != 0)
return false;
}
return true;
}
The function itself is not very smart. The main ideas are:
Use large unsigned machine words for data comparison.
Enable loop unrolling by factoring out an inner loop with a constant iteration count.
Reduce the number of branches by ORing the words into an accumulator and only comparing it every few iterations against zero.
This should also make it easy for the compiler to generate vectorized code using SIMD instructions which you really want for code like this.
Additional non-standard tweaks would be to annotate the function with __attribute__ ((hot)) and use __builtin_expect(acc != 0, false). Of course, the most important thing is to turn on your compiler's optimizations.
Being a former C programmer and current Erlang hacker one question has popped up.
How do I estimate the memory scope of my erlang datastructures?
Lets say I had an array of 1k integers in C, estimating the memory demand of this is easy, just the size of my array, times the size of an integer, 1k 32bit integers would take up 4kb or memory, and some constant amount of pointers and indexes.
In erlang however estimating the memory usage is somewhat more complicated, how much memory does an entry in erlangs array structure take up?, how do I estimate the size of a dynamically sized integer.
I have noticed that scanning over integers in array is fairly slow in erlang, scanning an array of about 1M integers takes almost a second in erlang, whereas a simple piece of c code will do it in arround 2 ms, this most likely is due to the amount of memory taken up by the datastructure.
I'm asking this, not because I'm a speed freak, but because estimating memory has, at least in my experience, been a good way of determining scalability of software.
My test code:
first the C code:
#include <cstdio>
#include <cstdlib>
#include <time.h>
#include <queue>
#include <iostream>
class DynamicArray{
protected:
int* array;
unsigned int size;
unsigned int max_size;
public:
DynamicArray() {
array = new int[1];
size = 0;
max_size = 1;
}
~DynamicArray() {
delete[] array;
}
void insert(int value) {
if (size == max_size) {
int* old_array = array;
array = new int[size * 2];
memcpy ( array, old_array, sizeof(int)*size );
for(int i = 0; i != size; i++)
array[i] = old_array[i];
max_size *= 2;
delete[] old_array;
}
array[size] = value;
size ++;
}
inline int read(unsigned idx) const {
return array[idx];
}
void print_array() {
for(int i = 0; i != size; i++)
printf("%d ", array[i]);
printf("\n ");
}
int size_of() const {
return max_size * sizeof(int);
}
};
void test_array(int test) {
printf(" %d ", test);
clock_t t1,t2;
t1=clock();
DynamicArray arr;
for(int i = 0; i != test; i++) {
//arr.print_array();
arr.insert(i);
}
int val = 0;
for(int i = 0; i != test; i++)
val += arr.read(i);
printf(" size %g MB ", (arr.size_of()/(1024*1024.0)));
t2=clock();
float diff ((float)t2-(float)t1);
std::cout<<diff/1000<< " ms" ;
printf(" %d \n", val == ((1 + test)*test)/2);
}
int main(int argc, char** argv) {
int size = atoi(argv[1]);
printf(" -- STARTING --\n");
test_array(size);
return 0;
}
and the erlang code:
-module(test).
-export([go/1]).
construct_list(Arr, Idx, Idx) ->
Arr;
construct_list(Arr, Idx, Max) ->
construct_list(array:set(Idx, Idx, Arr), Idx + 1, Max).
sum_list(_Arr, Idx, Idx, Sum) ->
Sum;
sum_list(Arr, Idx, Max, Sum) ->
sum_list(Arr, Idx + 1, Max, array:get(Idx, Arr) + Sum ).
go(Size) ->
A0 = array:new(Size),
A1 = construct_list(A0, 0, Size),
sum_list(A1, 0, Size, 0).
Timing the c code:
bash-3.2$ g++ -O3 test.cc -o test
bash-3.2$ ./test 1000000
-- STARTING --
1000000 size 4 MB 5.511 ms 0
and the erlang code:
1> f(Time), {Time, _} =timer:tc(test, go, [1000000]), Time/1000.0.
2189.418
First, an Erlang variable is always just a single word (32 or 64 bits depending on your machine). 2 or more bits of the word are used as a type tag. The remainder can hold an "immediate" value, such as a "fixnum" integer, an atom, an empty list ([]), or a Pid; or it can hold a pointer to data stored on the heap (tuple, list, "bignum" integer, float, etc.). A tuple has a header word specifying its type and length, followed by one word per element. A list cell on the uses only 2 words (its pointer already encodes the type): the head and tail elements.
For example: if A={foo,1,[]}, then A is a word pointing to a word on the heap saying "I'm a 3-tuple" followed by 3 words containing the atom foo, the fixnum 1, and the empty list, respectively. If A=[1,2], then A is a word saying "I'm a list cell pointer" pointing to the head word (containing the fixnum 1) of the first cell; and the following tail word of the cell is yet another list cell pointer, pointing to a head word containing the 2 and followed by a tail word containing the empty list. A float is represented by a header word and 8 bytes of double precision floating-point data. A bignum or a binary is a header word plus as many words as needed to hold the data. And so on. See e.g. http://stenmans.org/happi_blog/?p=176 for some more info.
To estimate size, you need to know how your data is structured in terms of tuples and lists, and you need to know the size of your integers (if too large, they will use a bignum instead of a fixnum; the limit is 28 bits incl. sign on a 32-bit machine, and 60 bits on a 64-bit machine).
Edit: https://github.com/happi/theBeamBook is a newer good resource on the internals of the BEAM Erlang virtual machine.
Is this what you want?
1> erts_debug:size([1,2]).
4
with it you can at least figure out how big a term is. The size returned is in words.
Erlang has integers as "arrays", so you cannot really estimate it in the same way as c, you can only predict how long your integers will be and calculate average amount of bytes needed to store them
check: http://www.erlang.org/doc/efficiency_guide/advanced.html and you can use erlang:memory() function to determine actual amount
When I run debugging it points to the line: 105 (and writes "segmentation fault" in the left corner). I don't know what does red line in "Call stack" window mean...
Please, tell waht it is and where can I read more about it.
Here is the function's code:
/* Separates stereo file's samples to L and R channels. */
struct LandR sepChannels_8( unsigned char *smp, unsigned long N, unsigned char *L, unsigned char *R, struct LandR LRChannels )
{
int i;
if ( N % 2 == 0 ) // Each channel's (L,R) number of samles is 1/2 of all samples.
{
L = malloc(N / 2);
R = malloc(N / 2);
}
else
if ( N % 2 == 1 )
{
L = malloc(N + 1 / 2);
R = malloc(N + 1 / 2);
}
int m = 0;
for ( i = 0; i < N; i++ ) // separating
{
L[m] = smp[2 * i + 0]; // THIS IS THE "LINE: 105"
R[m] = smp[2 * i + 1];
m++;
}
return LRChannels;
}
And here is sreenshot of the windows (easier to show it instead of trying to describe)
The line in red is your call stack: Basically, it's telling you that the problem occurred inside the the sepChannels_8() function, which was called from main(). You have, in fact, several bugs in your sepChannels_8() function.
Here is my analysis:
struct LandR sepChannels_8(unsigned char *smp, unsigned long N, unsigned char *L, unsigned char *R, struct LandR LRChannels)
sepChannels_8 is a function that takes five arguments of varying types and returns a value of type struct LandR. However, it's not clear what the five arguments passed to the function are. unsigned char *smp appears to be a pointer to your audio samples, with unsigned long N being the total number of samples. But unsigned char *L, unsigned char *R, and struct LandR LRChannels, it's not at all clear what the point is. You don't use them. unsigned char *L and unsigned char *R, your function promptly discards any passed-in pointers, replacing them with memory allocated using malloc(), which is then thrown away without being free()d, and the only thing you do with struct LandR LRChannels is simply return it unchanged.
{
int i;
if ( N % 2 == 0 ) // Each channel's (L,R) number of samles is 1/2 of all samples.
{
L = malloc(N / 2);
R = malloc(N / 2);
}
else
if ( N % 2 == 1 )
{
L = malloc(N + 1 / 2);
R = malloc(N + 1 / 2);
}
Now this is interesting: If the passed-in unsigned long, N, is an even number, you use malloc() to allocate two blocks of storage, each N / 2 in size, and assign them to L and R. If N is not even, you then double-check to see if it's an odd number, and if it is, you use malloc() to allocate two blocks of storage, each N in size, and assign them to L and R. I think you may have intended to allocate two blocks of storage that were each (N + 1) / 2 in size, but multiplication and division happen before addition and subtraction, so that's not what you get. You also fail to account for what happens if N is neither even nor odd. That's OK, because after all, that's an impossible condition... so why are you testing for the possibility?
int m = 0;
for ( i = 0; i < N; i++ ) // separating
{
L[m] = smp[2 * i + 0]; // THIS IS THE "LINE: 105"
R[m] = smp[2 * i + 1];
m++;
}
Mostly pretty standard: you've got a loop, with a counter, and arrays to traverse. However, your terminating condition is wrong. You're walking down your smp data two steps at a time, and you're doing it by multiplying your array index, so your index counter needs to run from 0 to N / 2, not from 0 to N. (Also, you need to account for that last item, if N was odd...). Further, you're using m and i for the same thing at the same time. One of them is unnecessary, and redundant, and not needed, and extra.
return LRChannels;
}
And, return the LRChannels struct that was passed in to the function, unmodified. At the same time, you're discarding the L and R variables, which contain pointers to malloc()-allocated storage, now lost.
What were L and R supposed to be? It almost looks as though they're supposed to be unsigned char **, so you could give your allocated storage back to the caller by storing the pointers through them... or perhaps struct LandR has two elements that are pointers, and you were intending to save L and R in the struct before returning it? for L and R, and LRChannels, I don't see why you're passing them to the function at all. You might as well make them all automatic variables inside the function just as int i and int m are.
You have malloced N/2 elements in the array but in the loop, your counter goes from 0 to N. And that will imply that you are trying to access elements from 0 to N because you increment m on every iteration. Obviously, you will get a seg fault.
What is the value of 'smp'?
It either needs to have been allocated prior to the call to sepChannels_8(), or point to a valid placeholder.
I'm trying to allocate a large space of contiguous memory in C and print this out to the user. My strategy for doing this is to create two pointers (one a pointer to double, one a pointer to pointer to double), malloc one of them to the entire size (m * n) in this case the pointer to pointer to double. Then malloc the second one to the size of m. The last step will be to iterate through the size of m and perform pointer arithmetic that would ensure the addresses of the doubles in the large array will be stored in contiguous memory. Here is my code. But when I print out the address it doesn't seem to be in contiguous (or in any sort of order). How do i print out the memory addresses of the doubles (all of them are of value 0.0) correctly?
/* correct solution, with correct formatting */
/*The total number of bytes allocated was: 4
0x7fd5e1c038c0 - 1
0x7fd5e1c038c8 - 2
0x7fd5e1c038d0 - 3
0x7fd5e1c038d8 - 4*/
double **dmatrix(size_t m, size_t n);
int main(int argc, char const *argv[])
{
int m,n,i;
double ** f;
m = n = 2;
i = 0;
f = dmatrix(sizeof(m), sizeof(n));
printf("%s %d\n", "The total number of bytes allocated was: ", m * n);
for (i=0;i<n*m;++i) {
printf("%p - %d\n ", &f[i], i + 1);
}
return 0;
}
double **dmatrix(size_t m, size_t n) {
double ** ptr1 = (double **)malloc(sizeof(double *) * m * n);
double * ptr2 = (double *)malloc(sizeof(double) * m);
int i;
for (i = 0; i < n; i++){
ptr1[i] = ptr2+m*i;
}
return ptr1;
}
Remember that memory is just memory. Sounds trite, but so many people seem to think of memory allocation and memory management in C as being some magic-voodoo. It isn't. At the end of the day you allocate whatever memory you need, and free it when you're done.
So start with the most basic question: If you had a need for 'n' double values, how would you allocate them?
double *d1d = calloc(n, sizeof(double));
// ... use d1d like an array (d1d[0] = 100.00, etc. ...
free(d1d);
Simple enough. Next question, in two parts, where the first part has nothing to do with memory allocation (yet):
How many double values are in a 2D array that is m*n in size?
How can we allocate enough memory to hold them all.
Answers:
There are m*n doubles in a m*n 2D-matrix of doubles
Allocate enough memory to hold (m*n) doubles.
Seems simple enough:
size_t m=10;
size_t n=20;
double *d2d = calloc(m*n, sizeof(double));
But how do we access the actual elements? A little math is in order. Knowing m and n, you can simple do this
size_t i = 3; // value you want in the major index (0..(m-1)).
size_t j = 4; // value you want in the minor index (0..(n-1)).
d2d[i*n+j] = 100.0;
Is there a simpler way to do this? In standard C, yes; in C++ no. Standard C supports a very handy capability that generates the proper code to declare dynamically-sized indexible arrays:
size_t m=10;
size_t n=20;
double (*d2d)[n] = calloc(m, sizeof(*d2d));
Can't stress this enough: Standard C supports this, C++ does NOT. If you're using C++ you may want to write an object class to do this all for you anyway, so it won't be mentioned beyond that.
So what does the above actual do ? Well first, it should be obvious we are still allocating the same amount of memory we were allocating before. That is, m*n elements, each sizeof(double) large. But you're probably asking yourself,"What is with that variable declaration?" That needs a little explaining.
There is a clear and present difference between this:
double *ptrs[n]; // declares an array of `n` pointers to doubles.
and this:
double (*ptr)[n]; // declares a pointer to an array of `n` doubles.
The compiler is now aware of how wide each row is (n doubles in each row), so we can now reference elements in the array using two indexes:
size_t m=10;
size_t n=20;
double (*d2d)[n] = calloc(m, sizeof(*d2d));
d2d[2][5] = 100.0; // does the 2*n+5 math for you.
free(d2d);
Can we extend this to 3D? Of course, the math starts looking a little weird, but it is still just offset calculations into a big'ol'block'o'ram. First the "do-your-own-math" way, indexing with [i,j,k]:
size_t l=10;
size_t m=20;
size_t n=30;
double *d3d = calloc(l*m*n, sizeof(double));
size_t i=3;
size_t j=4;
size_t k=5;
d3d[i*m*n + j*m + k] = 100.0;
free(d3d);
You need to stare at the math in that for a minute to really gel on how it computes where the double value in that big block of ram actually is. Using the above dimensions and desired indexes, the "raw" index is:
i*m*n = 3*20*30 = 1800
j*m = 4*20 = 80
k = 5 = 5
======================
i*m*n+j*m+k = 1885
So we're hitting the 1885'th element in that big linear block. Lets do another. what about [0,1,2]?
i*m*n = 0*20*30 = 0
j*m = 1*20 = 20
k = 2 = 2
======================
i*m*n+j*m+k = 22
I.e. the 22nd element in the linear array.
It should be obvious by now that so long as you stay within the self-prescribed bounds of your array, i:[0..(l-1)], j:[0..(m-1)], and k:[0..(n-1)] any valid index trio will locate a unique value in the linear array that no other valid trio will also locate.
Finally, we use the same array pointer declaration like we did before with a 2D array, but extend it to 3D:
size_t l=10;
size_t m=20;
size_t n=30;
double (*d3d)[m][n] = calloc(l, sizeof(*d3d));
d3d[3][4][5] = 100.0;
free(d3d);
Again, all this really does is the same math we were doing before by hand, but letting the compiler do it for us.
I realize is may be a bit much to wrap your head around, but it is important. If it is paramount you have contiguous memory matrices (like feeding a matrix to a graphics rendering library like OpenGL, etc), you can do it relatively painlessly using the above techniques.
Finally, you might wonder why would anyone do the whole pointer arrays to pointer arrays to pointer arrays to values thing in the first place if you can do it like this? A lot of reasons. Suppose you're replacing rows. swapping a pointer is easy; copying an entire row? expensive. Supposed you're replacing an entire table-dimension (m*n) in your 3D array (l*n*m), even more-so, swapping a pointer: easy; copying an entire m*n table? expensive. And the not-so-obvious answer. What if the rows widths need to be independent from row to row (i.e. row0 can be 5 elements, row1 can be 6 elements). A fixed l*m*n allocation simply doesn't work then.
Best of luck.
Never mind, I figured it out.
/* The total number of bytes allocated was: 8
0x7fb35ac038c0 - 1
0x7fb35ac038c8 - 2
0x7fb35ac038d0 - 3
0x7fb35ac038d8 - 4
0x7fb35ac038e0 - 5
0x7fb35ac038e8 - 6
0x7fb35ac038f0 - 7
0x7fb35ac038f8 - 8 */
double ***d3darr(size_t l, size_t m, size_t n);
int main(int argc, char const *argv[])
{
int m,n,l,i;
double *** f;
m = n = l = 10; i = 0;
f = d3darr(sizeof(l), sizeof(m), sizeof(n));
printf("%s %d\n", "The total number of bytes allocated was: ", m * n * l);
for (i=0;i<n*m*l;++i) {
printf("%p - %d\n ", &f[i], i + 1);
}
return 0;
}
double ***d3darr(size_t l, size_t m, size_t n){
double *** ptr1 = (double ***)malloc(sizeof(double **) * m * n * l);
double ** ptr2 = (double **)malloc(sizeof(double *) * m * n);
double * ptr3 = (double *)malloc(sizeof(double) * m);
int i, j;
for (i = 0; i < l; ++i) {
ptr1[i] = ptr2+m*n*i;
for (j = 0; j < l; ++j){
ptr2[i] = ptr3+j*n;
}
}
return ptr1;
}