I know the title is confusing but i don't know how to describe it better, let code explains itself:
I have a third-party library defines complex scalar as
typedef struct {
float real;
float imag;
} cpx;
so complex array/vector is like
cpx array[10];
for (int i = 0; i < 10; i++)
{
/* array[i].real and array[i].imag is real/imag part of i-th member */
}
current situation is, in a function I have two float array as arguments, I use two temporarily local complex array like:
void my_func(float *x, float *y) /* x is input, y is output, length is fixed, say 10 */
{
cpx tmp_cpx_A[10]; /* two local cpx array */
cpx tmp_cpx_B[10];
for (int i = 0; i < 10; i++) /* tmp_cpx_A is based on input x */
{
tmp_cpx_A[i].real = do_some_calculation(x[i]);
tmp_cpx_A[i].imag = do_some_other_calculation(x[i]);
}
some_library_function(tmp_cpx_A, tmp_cpx_B); /* tmp_cpx_B is based on tmp_cpx_A, out-of-place */
for (int i = 0; i < 10; i++) /* output y is based on tmp_cpx_B */
{
y[i] = do_final_calculation(tmp_cpx_B[i].real, tmp_cpx_B[i].imag);
}
}
I notice that after first loop x is useless, and second loop is in-place. If I can build tmp_cpx_B with same memory as x and y, I can save half of intermediate memory usage.
If the complex array is defined as
typedef struct{
float *real;
float *imag;
} cpx_alt;
then I can simply
cpx_alt tmp_cpx_B;
tmp_cpx_B.real = x;
tmp_cpx_B.imag = y;
and do the rest, but it is not.
I cannot change the definition of third library complex structure, and cannot take cpx as input because I want to hide internal library to outside user and not to break API.
So I wonder if it it possible to initialize struct array with scalar member like cpx with scalar array like x and y
Edit 1: for some common ask question:
in practice the array length is up to 960, which means one tmp_cpx array will take 7680 bytes. And my platform have total 56k RAM, save one tmp_cpx will save ~14% memory usage.
the 3rd party library is kissFFt and do FFT on complex array, it define its own kiss_fft_cpx instead of standard <complex.h> because it can use marco to switch bewteen floating/fixed point calculation
If you want standard compliant code, you can't reuse the memory pointed to by x and y to hold an array of cpx with the same dimension as the x/y arrays. There are several problems with that approach. The size of the x array plus size of the y array may not equal size of cpx array. The x and y arrays may not be in consecutive memory. Pointer type punning is not guaranteed to work by the C standard.
So the short answer is: No, you can't
However, if you are willing to accept code that isn't 100% standard compliant, it's very likely that in can be done. You'll have to check it very carefully on your specific system and accept that you can't move the code to another system without again checking it very carefully on that system (note: by system I mean cpu, compiler and it's version and so on).
There are some things you need to ensure
That the x and y arrays are consecutive in memory
That the cpx array has the same size as the two other arrays.
That alignment is ok
If that holds true, you can go for a non-standard type punning. Like:
#define SIZE 10
// Put x and y into a struct
typedef struct {
float x[SIZE];
float y[SIZE];
} xy_t;
Add some asserts to check that the memory layout is without any padding.
assert(sizeof(xy_t) == 2 * SIZE * sizeof(float));
assert(sizeof(cpx) == 2 * sizeof(float));
assert(sizeof(cpx[SIZE]) == sizeof(xy_t));
assert(alignof(cpx[SIZE]) == alignof(xy_t));
In my_func change
cpx tmp_cpx_A[SIZE];
cpx tmp_cpx_B[SIZE];
to
cpx tmp_cpx_A[SIZE];
cpx* tmp_cpx_B = (cpx*)x; // Ugly, non-portable type punning
This is the "dangerous" part. Instead of defining a new array, type punning through pointer casting is used so that tmp_cpx_B points to the same memory as x (and y). This is not standard compliant but on most systems it's likely to work when the above assertions hold.
Now call the function like:
xy_t xt;
for (int i = 0; i < SIZE; i++)
{
xt.x[i] = i;
}
my_func(xt.x, xt.y);
End note As pointed out several times, this approach is not standard compliant. So you should only do this kind of stuff if you really, really need to reduce your memory usage. And you need to check your specific system to make sure it will work an your system.
First of all, please note that C has a standardized library for complex numbers, <complex.h>. You might want to use that one instead of some non-standard 3rd party lib.
The main problem with your code might be execution speed, not memory usage. Allocating 2 * 10 * 2 = 40 floats isn't a big deal on most systems. On the other hand, you touch the same memory area over and over again. This might be needlessly inefficient.
Consider something like this instead:
void my_func (size_t size, const float x[size], float y[size])
{
for(size_t i=0; i<size; i++)
{
cpx cpx_A =
{
.real = do_some_calculation(x[i]),
.imag = do_some_other_calculation(x[i])
};
cpx cpx_B;
// ensure that the following functions work on single variables, not arrays:
some_library_function(&cpx_A, &cpx_B);
y[i] = do_final_calculation(cpx_B.real, cpx_B.imag);
}
}
Less instructions and less branching. And as a bonus, less stack usage.
In theory you might also gain a few CPU cycles by restrict qualifying the parameters, though I didn't spot any improvement when I tried that on this code (gcc x86-64).
Related
I need to implemenet a simple dynamic array that can work with any type.
Right now my void** implementation is ~50% slower than using int* directly:
#define N 1000000000
// Takes ~6 seconds
void** a = malloc(sizeof(void*) * N);
for (int i =0; i < N; i++) {
*(int*)(&a[i]) = i;
}
printf("%d\n", *(int*)&a[N-1]);
// Takes ~3 seconds
int* b = malloc(sizeof(int) * N) ;
for (int i =0; i < N; i++) {
b[i] = i;
}
printf("%d\n", b[N-1]);
I'm not a C expert. Is there a better way to do this?
Thanks
edit
Look like using void** is a bad idea. Is there a way to implement this with void*?
Here's how it's implemented in Go:
type slice struct {
array unsafe.Pointer
len int
cap int
}
I'd like to do something similar.
edit2
I managed to implement this with void*.
The solution was really simple:
void* a = malloc(sizeof(int) * N);
for (int i = 0; i < N; i++) {
((int*)a)[i] = i;
}
printf("%d\n", ((int*)a)[N-1]);
Performance is the same now.
Your two alternatives programs are not analogous. In the second one, which is valid, you allocate space sufficient to hold N integers, and then assign values to the int-size members of that space. In the first one, however, you allocate space large enough to accommodate N pointers to void and then, without initializing those pointers, you try to assign values to the objects to which they point. Even if those pointers had been initialized to point to int objects, there is an extra level of indirection.
Your first code could be corrected, in a sense, like so:
void** a = malloc(sizeof(void*) * N);
for (int i =0; i < N; i++) {
a[i] = (void *) i;
}
printf("%d\n", (int) a[N-1]);
That relies on the fact that C allows conversions between pointer and integer types (although not necessarily without data loss), and note that there is only a single level of indirection (array indexing), not two.
Inasmuch as the behavior of your implementation of the first alternative is undefined, we can only speculate about why it runs slower in practice. If we assume a straightforward implementation, however, then such a performance penalty as you observe might arise from poor cache locality for all the array writes.
Be aware that sizeof(void *) is the double of sizeof(int) on 64 bits processors (8 bytes address versus 4 bytes signed integer). If that's your case, I bet the difference only is page cache miss. You memory unit is required to load two times more pages, which is slow (link for more information here).
Please also note that C++ vectors aren't "dynamic array that can work with any type". They are bound to a type, for instance: std::vector<int> is a dynamic array but where you can only store int.
A solution to your problem would be to implement some sort of std::vector<void *> in C. But it's not efficient:.
You need to do 2 allocations for each element (1 for the container and 1 for the element itself)
You need to do 2 levels of indirection each time you access the data (1 to get the pointer in the container and 1 to get the data in the element)
You need to store some kind of type information in each element. If not, you don't know what is in your dynamic array
I managed to implement this with void*.
The solution was really simple:
void* a = malloc(sizeof(int) * N);
for (int i =0;i<N;i++) {
((int*)a)[i] = i;
}
printf("%d\n", ((int*)a)[N-1]);
Performance is the same now.
I also came across this great article that explains how to implement a generic data structure in C:
http://jiten-thakkar.com/posts/writing-generic-stack-in-c
I gor an array/ram buffer
char* bitmap_bufer;
int bitmap_buffer_size_x;
int bitmap_buffer_size_y;
bitmap_buffer is mallocked (and is often reallocked to different
sizes too), efficiency of acces to it contents is of absolutely TOP IMPORTANCE as i use it as a target to various rasterization/ per pixel drawing routines
bitmap_bufer[bitmap_buffer_size_x*y + x] = color; //etc
The question is if i move it to dll and import it in another module
__declspec(dllimport) char* bitmap_bufer;
__declspec(dllimport) int bitmap_buffer_size_x;
__declspec(dllimport) int bitmap_buffer_size_y;
will the read/write acces thru
bitmap_bufer[bitmap_buffer_size_x*y + x]
will be slower? Im suspectiing thet probably it may be a bit slower
(probably acces thru two pointers than one) but im not sure
one would be bitmap_buffer pointer itself and teh second
would be pointer pointing to it (?)
If so it is sad as in relity one would be needed? Do someone knows more on this and could explain this?
Accessing data allocated in a dynamic library is not slower, as a matter of fact, malloc itself resides in a dynamic library.
Accessing an array through a global variable may be inefficient even if it is in the process data segment. Store the pointer and the size into local variables and access the array this way:
char *bitmap = bitmap_bufer;
int pitch = bitmap_buffer_size_x;
int height = bitmap_buffer_size_y;
bitmap[y * span + x] = 0;
...
If you access pixels on a row-wise manner, use a pointer to the bitmap row:
char *row = bitmap + y * span;
for (int x = 0; x < width; x++) {
row[x] = 1;
}
Also do not use char for pixel values, either use unsigned char for byte values between 0 and UCHAR_MAX or use signed char for values between CHAR_MIN and CHAR_MAX. You might also use int8_t and uint8_t for clarity. Reserve the type char for C strings.
i have a 2D array of size 5428x5428 size.and it is a symmetric array. but while compiling it gives me an error saying that array size too large. can anyone provide me a way?
This array is to large for program stack memory - thats your error.
int main()
{
double arr[5428][5428]; // 8bytes*5428*5428 = 224MB
// ...
// use arr[y][x]
// ...
// no memory freeing needed
}
Use dynamic array allocation:
int main()
{
int i;
double ** arr;
arr = (double**)malloc(sizeof(double*)*5428);
for (i = 0; i < 5428; i++)
arr[i] = (double*)malloc(sizeof(double)*5428);
// ...
// use arr[y][x]
// ...
for (i = 0; i < 5428; i++)
free(arr[i]);
free(arr);
}
Or allocate plain array of size MxN and use ptr[y*width+x]
int main()
{
double * arr;
arr = (double*)malloc(sizeof(double)*5428*5428);
// ...
// use arr[y*5428 + x]
// ...
free(arr);
}
Or use combined method:
int main()
{
int i;
double * arr[5428]; // sizeof(double*)*5428 = 20Kb of stack for x86
for(i = 0; i < 5428; i++)
arr[i] = (double)malloc(sizeof(double)*5428);
// ...
// use arr[y][x]
// ...
for(i = 0; i < 5428; i++)
free(arr[i]);
}
When arrays get large, there are a number of solutions. The one that is good for you depends heavily on what you are actually doing.
I'll list a few to get you thinking:
Buy more memory.
Move your array from the stack to the heap.
The stack has tighter size limitations than the heap.
Simulate portions of the array (you say yours is symmetric, so just under 1/2 of the data is redundant).
In your case, the array is symmetric, so instead of using an array, use a "simulated array"
int getArray(array, col, row);
void setArray(array, col, row, value);
where array is a data structure tha only holds the lower left half and the diagonal. The getArray(..) then determines if the column is greater than the row, and if it is, it returns (note the reversed entries getArray(array, row, col); This leverages the symmetric property of the array without the need to actually hold both symmetric sides.
Simulate the array using a list (or tree or hash table) of "only the value holding items"
This works very well for sparse arrays, as you no longer need to allocate memory to hold large numbers of zero (or empty) values. In the event that someone "looks up" a non-set value, your code "discovers" no value set for that entry, and then returns the "zero" or empty value without it actually being stored in your array.
Again without more details, it is hard to know what kind of solution is the best approach.
When you create local variables, they go on the stack, which is of limited size. You're blowing through that limit.
You want your array to go on the heap, which is all the virtual memory your system has, i.e. gigs and gigs on a modern system. There are two ways to manage that. One is to dynamically allocate the array as in k06a's answer; use malloc() or your platform-specific allocator function (e.g. GlobalAlloc() on Windows) . The second is to declare the array as a global or module static variable, outside of any function.
Using a global or static has the disadvantage that this memory will be allocated for the entire lifetime of your program. Also, pretty much everybody hates globals on principle. On the other hand, you can use the two-dimensional array syntax, "array[x][y]" and the like, to access array elements... easier than doing array[x + y * width], plus you don't have to remember whether you're supposed to be doing "x + y * width" or "x * height + y" .
I recently submitted a small program for an assignment that had the following two functions and a main method inside of it:
/**
* Counts the number of bits it
* takes to represent an integer a
*/
int num_bits(int a)
{
int bitCount = 0;
while(a > 0)
{
bitCount++;
a = a >> 1; //shift to the right 1 bit
}
return bitCount;
}
/**
* Converts an integer into binary representation
* stored in an array
*/
void int2bin_array(int a, int *b)
{
//stopping point in search for bits
int upper_bound = num_bits(a);
int i;
for(i = 0; i < upper_bound; i++)
{
*(b+i) = (a >> i) & 1; //store the ith bit in b[i]
}
}
int main()
{
int numBits = num_bits(exponent);
int arr[numBits]; //<- QUESTION IS ABOUT THIS LINE
int2bin_array(exponent, arr);
//do some operations on array arr
}
When my instructor returned the program he wrote a comment about the line I marked above saying that since the value of numBits isn't known until run-time, initializing an array to size numBits is a dangerous operation because the compiler won't know how much memory to allocate to array arr.
I was wondering if someone could:
1) Verify that this is a dangerous operation
2) Explain what is going on memory wise when I initialize an array like that, how does the compiler know what memory to allocate? Is there any way to determine how much memory was allocated?
Any inputs would be appreciated.
That's a C99 variable length array. It is allocated at runtime (not by the compiler) on the stack, and is basically equivalent to
char *arr = alloca(num_bits);
In this case, since you can know the upper bound of the function, and it is relatively small, you'd be best off with
char arr[sizeof(int)*CHAR_BIT];
This array has a size known at compile time, will always fit everything you need, and works on platforms without C99 support.
It should be ok, it will just go on the stack.
The only danger is blowing out the stack.
malloc would be the normal way, then you know if you have enough memory or not and can make informed decisions on what to do next. But in many cases its ok to assume you can put not too big objects on the stack.
But strictly speaking, if you don't have enough space, this will fail badly.
I am working with a 2-dimensional array of structs which is a part of another struct. It's not something I've done a lot with so I'm having a problem. This function ends up failing after getting to the "test" for-loop near the end. It prints out one line correctly before it seg faults.
The parts of my code which read data into a dummy 2-d array of structs works just fine, so it must be my assigning array to be part of another struct (the imageStruct).
Any help would be greatly appreciated!
/*the structure of each pixel*/
typedef struct
{
int R,G,B;
}pixelStruct;
/*data for each image*/
typedef struct
{
int height;
int width;
pixelStruct *arr; /*pointer to 2-d array of pixels*/
} imageStruct;
imageStruct ReadImage(char * filename)
{
FILE *image=fopen(filename,"r");
imageStruct thisImage;
/*get header data from image*/
/*make a 2-d array of of pixels*/
pixelStruct imageArr[thisImage.height][thisImage.width];
/*Read in the image. */
/*I know this works because I after storing the image data in the
imageArr array, I printed each element from the array to the
screen.*/
/*so now I want to take the array called imageArr and put it in the
imageStruct called thisImage*/
thisImage.arr = malloc(sizeof(imageArr));
//allocate enough space in struct for the image array.
*thisImage.arr = *imageArr; /*put imageArr into the thisImage imagestruct*/
//test to see if assignment worked: (this is where it fails)
for (i = 0; i < thisImage.height; i++)
{
for (j = 0; j < thisImage.width; j++)
{
printf("\n%d: R: %d G: %d B: %d\n", i ,thisImage.arr[i][j].R,
thisImage.arr[i][j].G, thisImage.arr[i][j].B);
}
}
return thisImage;
}
(In case you are wondering why I am using a dummy array in the first place, well it's because when I started writing this code, I couldn't figure out how to do what I am trying to do now.)
EDIT: One person suggested that I didn't initialize my 2-d array correctly in the typedef for the imageStruct. Can anyone help me correct this if it is indeed the problem?
You seem to be able to create variable-length-arrays, so you're on a C99 system, or on a system that supports it. But not all compilers support those. If you want to use those, you don't need the arr pointer declaration in your struct. Assuming no variable-length-arrays, let's look at the relevant parts of your code:
/*data for each image*/
typedef struct
{
int height;
int width;
pixelStruct *arr; /*pointer to 2-d array of pixels*/
} imageStruct;
arr is a pointer to pixelStruct, and not to a 2-d array of pixels. Sure, you can use arr to access such an array, but the comment is misleading, and it hints at a misunderstanding. If you really wish to declare such a variable, you would do something like:
pixelStruct (*arr)[2][3];
and arr would be a pointer to an "array 2 of array 3 of pixelStruct", which means that arr points to a 2-d array. This isn't really what you want. To be fair, this isn't what you declare, so all is good. But your comment suggests a misunderstanding of pointers in C, and that is manifested later in your code.
At this point, you will do well to read a good introduction to arrays and pointers in C, and a really nice one is C For Smarties: Arrays and Pointers by Chris Torek. In particular, please make sure you understand the first diagram on the page and everything in the definition of the function f there.
Since you want to be able to index arr in a natural way using "column" and "row" indices, I suggest you declare arr as a pointer to pointer. So your structure becomes:
/* data for each image */
typedef struct
{
int height;
int width;
pixelStruct **arr; /* Image data of height*width dimensions */
} imageStruct;
Then in your ReadImage function, you allocate memory you need:
int i;
thisImage.arr = malloc(thisImage.height * sizeof *thisImage.arr);
for (i=0; i < thisImage.height; ++i)
thisImage.arr[i] = malloc(thisImage.width * sizeof *thisImage.arr[i]);
Note that for clarity, I haven't done any error-checking on malloc. In practice, you should check if malloc returned NULL and take appropriate measures.
Assuming all the memory allocation succeeded, you can now read your image in thisImage.arr (just like you were doing for imageArr in your original function).
Once you're done with thisImage.arr, make sure to free it:
for (i=0; i < thisImage.height; ++i)
free(thisImage.arr[i]);
free(thisImage.arr);
In practice, you will want to wrap the allocation and deallocation parts above in their respective functions that allocate and free the arr object, and take care of error-checking.
I don't think sizeof imageArr works as you expect it to when you're using runtime-sized arrays. Which, btw, are a sort of "niche" C99 feature. You should add some printouts of crucial values, such as that sizeof to see if it does what you think.
Clearer would be to use explicit allocation of the array:
thisImage.arr = malloc(thisImage.width * thisImage.height * sizeof *thisImage.arr);
I also think that it's hard (if even possible) to implement a "true" 2D array like this. I would recommend just doing the address computation yourself, i.e. accessing a pixel like this:
unsigned int x = 3, y = 1; // Assume image is larger.
print("pixel at (%d,%d) is r=%d g=%d b=%d\n", x, y, thisImage.arr[y * thisImage.width + x]);
I don't see how the required dimension information can be associated with an array at run-time; I don't think that's possible.
height and width are undefined; you might want to initialise them first, as in
thisImage.height = 10; thisImage.width = 20;
also,
what is colorRGB?
*thisImage.arr = *imageArr; /*put imageArr into the thisImage imagestruct*
This won't work. You have to declare arr as colorRGB **, allocate it accordingly, etc.
it looks like you are trying to copy array by assignment.
You cannot use simple assignment operator to do that, you have to use some function to copy things, for example memcpy.
*thisImage.arr = *imageArr;
thisimage.arr[0] = imagearr[0];
The above statements are doing the same thing.
However this is not most likely what causes the memory corruption
since you are working with two dimensional arrays, do make sure you initialize them correctly.
Looking at the code, should not even compile: the array is declared as one-dimensional in your image structure but you refer to as two-dimensional?