Efficiency of acces to array in dll - c

I gor an array/ram buffer
char* bitmap_bufer;
int bitmap_buffer_size_x;
int bitmap_buffer_size_y;
bitmap_buffer is mallocked (and is often reallocked to different
sizes too), efficiency of acces to it contents is of absolutely TOP IMPORTANCE as i use it as a target to various rasterization/ per pixel drawing routines
bitmap_bufer[bitmap_buffer_size_x*y + x] = color; //etc
The question is if i move it to dll and import it in another module
__declspec(dllimport) char* bitmap_bufer;
__declspec(dllimport) int bitmap_buffer_size_x;
__declspec(dllimport) int bitmap_buffer_size_y;
will the read/write acces thru
bitmap_bufer[bitmap_buffer_size_x*y + x]
will be slower? Im suspectiing thet probably it may be a bit slower
(probably acces thru two pointers than one) but im not sure
one would be bitmap_buffer pointer itself and teh second
would be pointer pointing to it (?)
If so it is sad as in relity one would be needed? Do someone knows more on this and could explain this?

Accessing data allocated in a dynamic library is not slower, as a matter of fact, malloc itself resides in a dynamic library.
Accessing an array through a global variable may be inefficient even if it is in the process data segment. Store the pointer and the size into local variables and access the array this way:
char *bitmap = bitmap_bufer;
int pitch = bitmap_buffer_size_x;
int height = bitmap_buffer_size_y;
bitmap[y * span + x] = 0;
...
If you access pixels on a row-wise manner, use a pointer to the bitmap row:
char *row = bitmap + y * span;
for (int x = 0; x < width; x++) {
row[x] = 1;
}
Also do not use char for pixel values, either use unsigned char for byte values between 0 and UCHAR_MAX or use signed char for values between CHAR_MIN and CHAR_MAX. You might also use int8_t and uint8_t for clarity. Reserve the type char for C strings.

Related

How to use scalar array initialize array of struct with scalar member?

I know the title is confusing but i don't know how to describe it better, let code explains itself:
I have a third-party library defines complex scalar as
typedef struct {
float real;
float imag;
} cpx;
so complex array/vector is like
cpx array[10];
for (int i = 0; i < 10; i++)
{
/* array[i].real and array[i].imag is real/imag part of i-th member */
}
current situation is, in a function I have two float array as arguments, I use two temporarily local complex array like:
void my_func(float *x, float *y) /* x is input, y is output, length is fixed, say 10 */
{
cpx tmp_cpx_A[10]; /* two local cpx array */
cpx tmp_cpx_B[10];
for (int i = 0; i < 10; i++) /* tmp_cpx_A is based on input x */
{
tmp_cpx_A[i].real = do_some_calculation(x[i]);
tmp_cpx_A[i].imag = do_some_other_calculation(x[i]);
}
some_library_function(tmp_cpx_A, tmp_cpx_B); /* tmp_cpx_B is based on tmp_cpx_A, out-of-place */
for (int i = 0; i < 10; i++) /* output y is based on tmp_cpx_B */
{
y[i] = do_final_calculation(tmp_cpx_B[i].real, tmp_cpx_B[i].imag);
}
}
I notice that after first loop x is useless, and second loop is in-place. If I can build tmp_cpx_B with same memory as x and y, I can save half of intermediate memory usage.
If the complex array is defined as
typedef struct{
float *real;
float *imag;
} cpx_alt;
then I can simply
cpx_alt tmp_cpx_B;
tmp_cpx_B.real = x;
tmp_cpx_B.imag = y;
and do the rest, but it is not.
I cannot change the definition of third library complex structure, and cannot take cpx as input because I want to hide internal library to outside user and not to break API.
So I wonder if it it possible to initialize struct array with scalar member like cpx with scalar array like x and y
Edit 1: for some common ask question:
in practice the array length is up to 960, which means one tmp_cpx array will take 7680 bytes. And my platform have total 56k RAM, save one tmp_cpx will save ~14% memory usage.
the 3rd party library is kissFFt and do FFT on complex array, it define its own kiss_fft_cpx instead of standard <complex.h> because it can use marco to switch bewteen floating/fixed point calculation
If you want standard compliant code, you can't reuse the memory pointed to by x and y to hold an array of cpx with the same dimension as the x/y arrays. There are several problems with that approach. The size of the x array plus size of the y array may not equal size of cpx array. The x and y arrays may not be in consecutive memory. Pointer type punning is not guaranteed to work by the C standard.
So the short answer is: No, you can't
However, if you are willing to accept code that isn't 100% standard compliant, it's very likely that in can be done. You'll have to check it very carefully on your specific system and accept that you can't move the code to another system without again checking it very carefully on that system (note: by system I mean cpu, compiler and it's version and so on).
There are some things you need to ensure
That the x and y arrays are consecutive in memory
That the cpx array has the same size as the two other arrays.
That alignment is ok
If that holds true, you can go for a non-standard type punning. Like:
#define SIZE 10
// Put x and y into a struct
typedef struct {
float x[SIZE];
float y[SIZE];
} xy_t;
Add some asserts to check that the memory layout is without any padding.
assert(sizeof(xy_t) == 2 * SIZE * sizeof(float));
assert(sizeof(cpx) == 2 * sizeof(float));
assert(sizeof(cpx[SIZE]) == sizeof(xy_t));
assert(alignof(cpx[SIZE]) == alignof(xy_t));
In my_func change
cpx tmp_cpx_A[SIZE];
cpx tmp_cpx_B[SIZE];
to
cpx tmp_cpx_A[SIZE];
cpx* tmp_cpx_B = (cpx*)x; // Ugly, non-portable type punning
This is the "dangerous" part. Instead of defining a new array, type punning through pointer casting is used so that tmp_cpx_B points to the same memory as x (and y). This is not standard compliant but on most systems it's likely to work when the above assertions hold.
Now call the function like:
xy_t xt;
for (int i = 0; i < SIZE; i++)
{
xt.x[i] = i;
}
my_func(xt.x, xt.y);
End note As pointed out several times, this approach is not standard compliant. So you should only do this kind of stuff if you really, really need to reduce your memory usage. And you need to check your specific system to make sure it will work an your system.
First of all, please note that C has a standardized library for complex numbers, <complex.h>. You might want to use that one instead of some non-standard 3rd party lib.
The main problem with your code might be execution speed, not memory usage. Allocating 2 * 10 * 2 = 40 floats isn't a big deal on most systems. On the other hand, you touch the same memory area over and over again. This might be needlessly inefficient.
Consider something like this instead:
void my_func (size_t size, const float x[size], float y[size])
{
for(size_t i=0; i<size; i++)
{
cpx cpx_A =
{
.real = do_some_calculation(x[i]),
.imag = do_some_other_calculation(x[i])
};
cpx cpx_B;
// ensure that the following functions work on single variables, not arrays:
some_library_function(&cpx_A, &cpx_B);
y[i] = do_final_calculation(cpx_B.real, cpx_B.imag);
}
}
Less instructions and less branching. And as a bonus, less stack usage.
In theory you might also gain a few CPU cycles by restrict qualifying the parameters, though I didn't spot any improvement when I tried that on this code (gcc x86-64).

Declaring 2D integer array at specific memory location with C

I am writing in C on a BeagleBone Black MCU.
I need to create a 2D unsigned int array to store some collected analog data from two sensors.
Currently I have individual unsigned int arrays for each sensor's data but I'd like to change this to have one variable that is 2 dimensions with one dimension being the sensor the data originated from.
Here's what I have so far and it works just fine. (Apologies if this isn't formatting correctly. I tried to bold the code but it doesn't seem to work in Chrome the way I'd expect.)
#define SHARE_MEM 0x10000
#define E_RING_BUFFER_SIZE 200
volatile unsigned int *DetTSampleSet = (unsigned int *) SHARE_MEM;
volatile unsigned int *DetBSampleSet = (unsigned int *) (SHARE_MEM + (E_RING_BUFFER_SIZE * sizeof(unsigned int)));
I believe this code ensures that DetBSampleSet is located immediately after DetTSampleSet with no overlap. It works fine. I am able to use these variables like this.
int pnr;
for (pnr = 0; pnr <10;pnr++)
{
// do some stuff to get RawAnalog from sensor T.
DetTSampleSet[pnr] = RawAnalog;
// do some stuff to get RawAnalog from sensor B.
DetBSampleSet[pnr] = RawAnalog;
}
What I want is this.
int pnr;
for (pnr = 0; pnr <10; pnr++)
{
// do some stuff to get RawAnalog from sensor T (0)
DetSampleSet[0][pnr] = RawAnalog;
// do some stuff to get RawAnalog from sensor B (1)
DetSampleSet[1][pnr] = RawAnalog;
I think I can just declare this as the first variable in this memory space like this.
#define SHARE_MEM 0x10000
#define E_RING_BUFFER_SIZE 200
volatile unsigned int *DetSampleSet = (unsigned int *) SHARE_MEM
If I do then, I don't think I have to worry about how this data is actually structured in memory as in
are the first four bytes DetSampleSet[0][0] and the next four bytes DetSampleSet[0][1] or DetSampleSet[1][0] because I don't plan to access this data with any pointers/addresses?
However, if I want to declare another variable in memory adjacent to this variable with no overlap, do I just double the size offset like this?
volatile unsigned int *NewIntVariableAfterFirstOne = (unsigned int *) (SHARE_MEM + (E_RING_BUFFER_SIZE * 2 * sizeof(unsigned int)));
Thanks for any and all help and your patience as I'm getting back into C after nearly 30 years.
I appreciate the comments and answers. I've tried to post a response but it seems I can't comment at length but have to add to my original question. So here goes...
So, I readily admit to getting lost sometimes in the declaration of pointers like this. The original code I posted works fine. I need to declare multiple variables in this memory space so my primary concern is to declare them properly so as to not overwrite one. Basically, ensure the start of the next variable declared is past the end of the one declared prior. So, for example, if the pointer to integer variable A is to be used as a 1-D array of X elements, then Xsizeof(int) should be a safe start of the next variable, say integer variable B. Right? And if I want to use variable A as an array to be accessed as a 2-D array, then I would just do 2X*sizeof(int) to get the start of the next variable after A, right?
Supposing that it is a valid and appropriate thing in the first place, according to your C implementation, to create a pointer value in the way you are doing, what you need is a to declare your pointer as a pointer to an array, and cast appropriately:
#define SHARE_MEM 0x10000
#define E_RING_BUFFER_SIZE 200
volatile unsigned int (*DetSampleSet)[E_RING_BUFFER_SIZE] =
(unsigned int (*)[E_RING_BUFFER_SIZE]) SHARE_MEM;
You should then be able to access the block by doubly indexing DetSampleSet, just as you say you want, with all the values for DetSampleSet[0] laid out in memory contiguously, and immediately preceding those for DetSampleSet[1].
I would rather suggest to use when accessing hardware fixed size integers
You can use the pointer to array:
#define SHARE_MEM 0x10000
#define E_RING_BUFFER_SIZE 200
typedef int32_t buff[E_RING_BUFFER_SIZE];
#define DetSampleSet ((volatile buff *)SHARE_MEM)
//example usage
int32_t foo(size_t sample, size_t sensor)
{
return DetSampleSet[sensor][sample];
}
#define is used to avoid unnecessary memory reads.
https://godbolt.org/z/c45rKvvvh
EDIT. The comments have changed the requirements. You need to change linkerscript.
Add memory area (length gas to be set by you as I do not know how bit it is):
MEMORY
{
/* the stuff which was already here */
SHAREDMEM (rw) : ORIGIN = 0x10000, LENGTH = 32M
}
In the sections add new section
.shared_mem_section (NOLOAD):
{
shared_mem_section_start = .;
KEEP(*(.shared_mem_section))
KEEP(*(.shared_mem_section*))
shared_mem_section_end = .;
} > SHAREDMEM
In your C code:
unsigned __attribute__((section(".shared_mem_section"))) DetSampleSet[NSENSORS][E_RING_BUFFER_SIZE];

Running out of RAM when adding large arrays to be manipulated in a function

I'm having a problem with the RAM limitation of my microcontroller (max of 124KB). Here is the part of my code that is giving me some headache:
// Declaring arrays (they actually need to be float)
// The three dots (...) means that there are several more numbers inside the arrays (actually there are 345 numbers inside each array).
float AR_KELON_POWER_COOL_21_FANMAX[] = {1,348,174,21,66,21,66,...,3800};
float AR_KELON_SWING_COOL_21_FANMAX[] = {1,347,173,21,65,21,65,...3800};
float AR_KELON_COOL_18_FAN1[] = {1,348,174,21,65,21,65,21,21,...,3800}
float AR_KELON_COOL_18_FAN2[] = {1,348,173,21,66,21,66,21,...,3800}
float AR_KELON_COOL_18_FAN3[] = {1,348,173,21,66,21,66,21,20,...,3800}
float AR_KELON_COOL_18_FANMAX[] = {1,348,173,21,66,21,66,21,...,3800}
float AR_KELON_COOL_19_FAN1[] = {1,348,174,21,66,21,66,21,21,...,3800}
//Will be adding more arrays
//array of pointers
float *airConditionerCommands[] = {
AR_KELON_POWER_COOL_21_FANMAX,
AR_KELON_SWING_COOL_21_FANMAX,
AR_KELON_COOL_18_FAN1,
AR_KELON_COOL_18_FAN2,
AR_KELON_COOL_18_FAN3,
AR_KELON_COOL_18_FANMAX,
AR_KELON_COOL_19_FAN1
// I put more arrays inside here as I declare the arrays
}
void executeCommands(int data) {
int i=0;
int command=0;
for(i=0; airConditionerCommands[i] != NULL; i++) {
command = i + 67;
if(command == data) {
//This function will execute some instructions as it reads the array airConditionerCommands[i].
sendIR(airConditionerCommands[i], 400, 38);
break;
}
}
}
This program works perfectly if there are less than about 110 declared arrays (and abviously put inside of *airConditionerCommands[]). But if I declare more arrays the program won't be compiled, because there is no more RAM left in the microcontroller.
Is there a way around this? I don't have too much experience with C, but I think there is a way to allocate and free the memory the arrays are using.
Any help would be awsome. Thanks
On microcontrollers, immutable data -- i.e. data you will never even try to modify --, can be stored in the program flash/ROM instead of RAM. All you need is to declare both the array and its data elements as const or static const.
In other words, use
// Declaring arrays (they actually need to be float)
// The three dots (...) means that there are several more numbers inside the arrays (actually there are 345 numbers inside each array).
const float AR_KELON_POWER_COOL_21_FANMAX[] = {1,348,174,21,66,21,66,...,3800};
const float AR_KELON_SWING_COOL_21_FANMAX[] = {1,347,173,21,65,21,65,...3800};
const float AR_KELON_COOL_18_FAN1[] = {1,348,174,21,65,21,65,21,21,...,3800}
const float AR_KELON_COOL_18_FAN2[] = {1,348,173,21,66,21,66,21,...,3800}
const float AR_KELON_COOL_18_FAN3[] = {1,348,173,21,66,21,66,21,20,...,3800}
const float AR_KELON_COOL_18_FANMAX[] = {1,348,173,21,66,21,66,21,...,3800}
const float AR_KELON_COOL_19_FAN1[] = {1,348,174,21,66,21,66,21,21,...,3800}
//Will be adding more arrays
If the command sequences themselves are immutable, then you can do
//array of pointers
const float *const airConditionerCommands[] = {
AR_KELON_POWER_COOL_21_FANMAX,
AR_KELON_SWING_COOL_21_FANMAX,
AR_KELON_COOL_18_FAN1,
AR_KELON_COOL_18_FAN2,
AR_KELON_COOL_18_FAN3,
AR_KELON_COOL_18_FANMAX,
AR_KELON_COOL_19_FAN1,
// I put more arrays inside here as I declare the arrays
NULL
}
too. If checked it right, the STM32F207VCT6 only has 256kB of Flash, so you might wish to keep the pointer arrays in RAM, though. If you do this, note that not only are all the elements in it const float pointers, but also the airConditionerCommands variable itself is const. If you omit one of the consts, the uVision compiler is keen to keep the array in RAM instead.
(If you have difficulty remembering how to designate variables as const, always read them from right to left: the rightmost refers to the variable itself (unless it refers to an array, in which case it refers to the elements); progressing left, separated by *, the const refers to the pointed to object. So, const void *a means a can be changed, but the thing it points to cannot be changed; void *const a means a cannot be changed, but the thing it points to can be changed, and const void *const a means neither a nor the thing it points to can be changed.)
If I were you, I'd also check how many unique float variables you have in the arrays overall. If you have less than 256, then uint8_t or unsigned char would suffice as the command array type; if less than 65536, then uint16_t or usigned short would work. You then use a float lookup array:
const float lookupf[] = {
0.0f,
#define FLOAT_0 0
1.0f,
#define FLOAT_1 1
21.0f,
#define FLOAT_21 2
/* ... */
348.0f,
#define FLOAT_348 72
/* ... */
3800.0f
#define FLOAT_3800 255
};
const unsigned char AR_KELON_POWER_COOL_21_FANMAX[] = {
FLOAT_1, /* 1.0f */
FLOAT_348, /* 348.0f */
/* .. */
FLOAT_3800, /* 3800.0f */
};
with sendIR modified from
void sendIR(const float *const cmds, ...)
{
int i;
for (i = 0; cmds[i] != 3800.0f; i++) {
const float cmd = cmds[i];
/* ... */
}
}
to
void sendIR(const float *const cmds, ...)
{
int i;
for (i = 0; cmds[i] != FLOAT_3800; i++) {
const float cmd = lookupf[cmds[i]];
/* ... */
}
}
You see, float is a 32-bit type, requiring four bytes per value. A uint8_t/unsigned char is an 8-bit type, requiring just a single type, with the largest possible lookup array (256 entries) taking an additional 1024 bytes of Flash.
Even with const unsigned short or const uint16_t, which is a 16-bit type, you can often save some memory, since command arrays memory use is halved. (The lookup table still uses four additional bytes per unique float, so the total memory savings may be very small in some cases.)
Since you are already hitting the limit (over 31000 floats), the max. 256-entry lookup array could cut your RAM/Flash use to a quarter (25% of current). But it really depends on how many unique float values you need to store.
I personally do such lookup mapping by putting each command sequence into a separate text file, in a specific subdirectory, with only the numbers, one number per line. Then, I use an awk script to gather all the values into one huge lookup file. Then I use another awk script to generate the C header file declaring the arrays, including the lookup array; with data array names derived from the file names. It saves a lot of work, and is easy to set up.
Questions?
Assuming that you want to declare the array dynamically and free it when no longer necessary, you can take a look at malloc and free functions.

How can a double pointer be used for a two dimensional matrix?

I am trying my hand at C by implementing Conway's game of Life.
I am trying to dynamically build two grids (int matrices), one for the current and one for the next generation, so after I determine what the next generation looks like, I just swap pointers.
At first I tried hopelessly to define the pointer to the grid like int * grid, which you cannot subscript with a second set of brackets like [][] because - obviously - the first set of brackets returns an int.
I also tried something like int * grid[HEIGHT][WIDTH], but this gives problems assigning one pointer like this to another. (And in fact, I have no idea what this really does in memory!)
In my naïve hopefulness, I thought the following could work after stumbling across double pointers. The program compiles, but fails when running on the line indicated. (In Windows, I get no more detail other than that the Problem Event Name is APPCRASH).
DISCLAIMER: This is not the actual program, just a proof of concept for the problem.
#include <stdio.h>
#include <stdlib.h>
int HEIGHT = 20;
int WIDTH = 20;
int ** curr_gen; // Current generation
int ** next_gen; // Next generation
/* Entry Point main */
int main(int argc, char** argv) {
// Allocate memory for the grids
curr_gen = malloc(sizeof (int) * WIDTH * HEIGHT);
next_gen = malloc(sizeof (int) * WIDTH * HEIGHT);
curr_gen[0][0] = 0; //<< PROGRAM FAILS HERE
// Release heap resources
free(curr_gen);
free(next_gen);
return 0;
}
You can simply allocate the space and cast the pointer to the type which defines the col and row sizes. Looking up a pointer via [][] is expensive. And building a dynamic multi dimensional array this way should be reserved for ragid arrays.. IE: only use it when necessary.
You can define a type:
typedef int MyArray[20][20];
And then cast the malloc pointer to the type you want:
MyArray * curr_gen = (MyArray *) malloc(...);
However this assumes that you have a constant, known at compile time height and width. If it must be dynamic then by all means use the index into a pointer table method. But keep in mind that the actual pointer looked up must be loaded at the last possible minute leading to Pipeline stalls, and potential cache misses. Making it 100 times more expensive than just doing the math yourself via [row * 20 + col].
So the real question you should ask yourself is "Does it need to run fast, or do I want the code to look 'Neat'?"
A common way to do this is described in http://c-faq.com/aryptr/dynmuldimary.html
You can just use int* as the type of grid in my way.
Convert the 2D position to 1D by a macro define or a function:
#define MATRIX2INDEX(x, y, width) ((x) + (y) * (width)) // `width` is the max of x + 1 :)
int Matrix2Index(int x, int y, int width)
{
return MATRIX2INDEX(x, y, width);
}
Visit the data by 2D position in int*:
int* grid = (int*)malloc(sizeof(int) * WIDTH * HEIGHT);
grid[MATRIX2INDEX(0, 0, WIDTH)] = 0; // here: get the data you want by 2D position
free(grid); grid = NULL;

Explain the "scanline pointer array" in this image object?

I am new to C, and this struct, representing an image, is confusing to me. It's used in this Graphics Gem.
Can someone explain the proper instantiation and use of the struct, especially with regard to the scanline pointer array?
typedef unsigned char Pixel;
typedef struct {
short Hres; /* no. pixels in x direction */
short Vres; /* no. pixels in y direction */
int Size; /* size in bytes */
Pixel *i; /* pixel array */
Pixel *p[1]; /* scanline pointer array; position (x,y) given by image->p[y][x] */
} Image;
Also: is the point to avoid the multiplication implicit when indexing a 2D array? Should it not then be **p which can be allocated to Vres * sizeof(size_t) and populated with the appropriate row pointers?
Update
I think I understand. Is this example block valid?
int m, n, y, x; /* Vres, Hres, index variables */
Image *image;
image = malloc(sizeof(Image) + (m - 1) * sizeof(Pixel*));
image->Hres = n;
image->Vres = m;
image->Size = m*n*sizeof(Pixel);
image->i = malloc(image->Size);
for (y=0; y<m; y++)
{
image->p[y] = image->i + (y * n);
for (x=0; x<n; x++)
{
image->p[y][x] = 0; /* or imageSource[y][x] */
}
}
/* use image */
free(image->i);
free(image);
Finally, on modern computers (with lots of memory), does it make sense to use such a scanline pointer array rather than a 2D array? In this case the only difference would be the implicit pointer multiplication.
The last member, p, of the structure Image is used as a C89 flexible array member.
Flexible array member is a C99 feature and before C99, people sometimes used what was called the struct hack to achieve a similar behavior with C89.
Here is how to dynamically allocate a Image structure object with only one array element:
Image *bla1, *bla2;
bla1 = malloc(sizeof *bla1);
and here is how to allocate a structure object with n array elements:
bla2 = malloc(sizeof *bla2 + (n - 1) * sizeof bla2->p[0]);
After correct initialization of the Pixel * pointers in the p array, you can access the Pixel values like this:
bla2->p[x][y]
Regarding the conformity of the struct hack, C99 Rationale says that
the validity of this construct has always been questionable.
while a C Defect Report (DR #051) says that
The idiom, while common, is not strictly conforming.
Pixel *p[1] is most probably what is called a "flexible" array.
The trick is to put such arrays at the end of a struct and then allocate a block that is the size of the struct plus the total size of the additional array entries at the end. This relieves you from having to know the exact size of the array when defining the struct, you rather specify it at runtime:
Image *img = malloc(sizeof(Image) + 5 * sizeof(Pixel *));
/* img->p[5] is the last element of the array now */
Strictly speaking, you will be accessing the array beyond its bounds, but you make this operation safe by knowing that you have manually reserved enough additional memory past the end of the struct.

Resources