I have a function called blend_pixels() whose task is to blend a single pixel onto another pixel according to the specified blending mode. That function is in turn called by pretty much any function that wants to draw anything.
The problem is that function is called for every single pixel, that means it's called tens of millions of times a second, and it contains a switch-case statement going through all possible blending modes until it finds the right one.
Obviously this is somewhat slower than calling a function that directly does the desired operations, and that's the problem I'm trying to fix. The parent functions that call blend_pixels() usually just pass on the blending mode that they themselves received as an argument once called, so I can't just have them called a small function that would only do one blending mode. But the choice only needs to be done once for every call of the parent function (the parent functions operate on a lot of pixels per call whereas blend_pixels() is called for every single pixel, in a loop going through all the necessary pixels).
The function looks like this:
void blend_pixels(lrgb_t *bg, lrgb_t fg, int32_t p, const int mode)
{
int32_t r, g, b;
switch (mode)
{
case SOLID:
*bg = fg;
break;
case ADD:
r = (fg.r * p >> 15) + bg->r; if (r>ONE) bg->r = ONE; else bg->r = r;
g = (fg.g * p >> 15) + bg->g; if (g>ONE) bg->g = ONE; else bg->g = g;
b = (fg.b * p >> 15) + bg->b; if (b>ONE) bg->b = ONE; else bg->b = b;
break;
case SUB:
r = -(fg.r * p >> 15) + bg->r; if (r<0) bg->r = 0; else bg->r = r;
g = -(fg.g * p >> 15) + bg->g; if (g<0) bg->g = 0; else bg->g = g;
b = -(fg.b * p >> 15) + bg->b; if (b<0) bg->b = 0; else bg->b = b;
break;
case MUL:
... // you get the idea
}
}
and is called in this kind of way:
void parent_function(lrgb_t *fb, int w, int h, lrgb_t colour, ... int blendingmode)
{
...
for (iy=y0; iy<y1; iy++)
for (ix=x0; ix<x1; ix++)
{
p = some_weighting_formula();
blend_pixels(&fb[iy*w+ix], colour, p, blendingmode);
}
}
which itself might be called like:
parent_function(fb, w, h, orange, ... /*whatever*/, ADD);
"ADD" being an integer from an enum
So clearly any switch-case to pick the blending algorithm should be done outside of parent_function's loops. But how?
You can do this with function pointers.
First define a typedef for your function pointer:
typedef void (*blend_function)(lrgb_t *, lrgb_t, int32_t);
Then break out each part of blend_pixels into its own function, each with identical parameters and return type as the typedef:
void blend_pixels_add(lrgb_t *bg, lrgb_t fg, int32_t p)
...
void blend_pixels_sub(lrgb_t *bg, lrgb_t fg, int32_t p)
...
void blend_pixels_mult(lrgb_t *bg, lrgb_t fg, int32_t p)
...
Then in your parent function, you can assign a variable of the function pointer type, and assign it the address of the function you want to use:
void parent_function(lrgb_t *fb, int w, int h, lrgb_t colour, ... int blendingmode)
{
...
blend_function blend;
switch (blendingmode)
{
case ADD:
blend = blend_pixels_add;
break;
case SUB:
blend = blend_pixels_sub;
break;
...
}
for (iy=y0; iy<y1; iy++)
for (ix=x0; ix<x1; ix++)
{
p = some_weighting_formula();
blend(&fb[iy*w+ix], colour, p);
}
}
Addressing your concern that "and it contains a switch-case statement going through all possible blending modes until it finds the right one.", this is probably not what really happens.
Switch statements are generally compiled into what is called a jump table. In a jump table, the code does not step through all of the cases looking for the correct one, instead the argument of the switch() statement is used as the index in an array of addresses. Something like:
jump_table[SOLID] -> case SOLID address
jump_table[ADD] -> case ADD address
...
So, in this sort of implementation, a switch statement that is considering many, many values should be just as fast as a hand-coded function pointer solution because that is essentially what the compiler builds.
Related
I need to do a subroutine which multiplies two numbers and shifts the final result.
Which is a more proper subroutine:
void mul_inplace_both_pointers(q23* inout, q23* in)
{
*inout = (*inout * *in);
*inout = *inout << 8;
}
or
void mul_inplace_one_pointer(q23* inout, q23 in)
{
*inout = (*inout * in);
*inout = *inout << 8;
}
or
q23 mul_no_pointers(q23 in1, q23 in2)
{
q23 out;
out = in1 * in2;
out = out << 8;
return out;
}
The code runs on a DSP processor, so it should be both speed and size optimized.
q23 a;
q23 b;
mul_inplace_both_pointers( &a, &b);
mul_inplace_one_pointer ( &a, b);
a = mul_inplace_no_pointers ( a, b);
For a minute I can omit speed and size requirement, and just ask what is the most proper from programmers point of view?
As a rule of thumb, I would minimise the number of pointers passed to a function as much as possible. That would work out to only passing a pointer if (1) the value of the argument (pointed to) needs to be changed or (2) passing by value involves overhead that is significant in terms of working of your program (e.g. a very large data structure).
So, if you must use a function like your mul_inplace_XXX() style of function, I would probably choose the mul_inplace_one_pointer() function ...
In your particular case - depending on what ae_q23 is, I wouldn't use a function at all. I'd simply do
inout *= in;
inout <<= 8;
in the ostensible caller instead. Or, subject to testing;
inout = (inout * in) << 8;
Obviously I'm assuming none of the operations introduce undefined or otherwise (potentially) unintended behaviour.
In terms of performance, there are rarely any absolute guarantees though. Test and profile, naturally, to be sure on your target platform. Trying to optimise without testing/profiling is called "premature optimisation" for several reasons - most notably, that the code you lovingly hand-craft may not actually be optimal by your intended measures.
Having a strange problem. I finally figured out how to turn a variable "i" that increments inside a loop into a string that I can pass to a function which turns it into a character that is output to a display.
The trouble is, the value increments by 2 rather than by 1! I have tried a few different things but I'm not sure where the problem lies.
Initially, I thought that perhaps, if you have a function that calls another function, and they both use the same variable to increment (i.e. for(int i=0; i<10; ++i)), then the "outer" function would have "i" increment by two because it is incremented once in the "outer" loop and once in the "inner" loop. However I think this is not the case. If it were so "i" would get incremented by more than two in my case, and I tried changing all the counter variables to different names with no change. It would be a silly way for the language to work, anyway. Unless of course it IS the case, I'd love to be enlightened.
Here is the code block giving me trouble:
for (int i=0; i<100; i++){
char c[1]={0}; // Create variable to hold character
sprintf(c,"%d", i); // Copy value of "i" as string to variable
writeText(c,0,0,WHITE,BLACK,3); // Write the character "c" at position 0,0. Size 3
OLED_buffer(); // Send display buffer
delay_ms(500); // Delay before next increment
}
Here is writeText():
void writeText(unsigned char *string, int16_t x, int16_t y, uint16_t color, uint16_t bgcolor, uint8_t size){
unsigned char letter;
for (int i=0; i<strlen(string); ++i){
letter = string[i];
if (letter != NULL){
drawChar(x+(i*6*size),y,letter,color,bgcolor,size);
}
}
}
Here is drawChar, called by writeText:
void drawChar(int16_t x, int16_t y, unsigned char c, uint16_t color, uint16_t bg, uint8_t size) {
if((x >= _width) || // Clip right
(y >= _height) || // Clip bottom
((x + 5 * size - 1) < 0) || // Clip left
((y + 8 * size - 1) < 0)) // Clip top
return;
for (int8_t i=0; i<6; i++ ) {
uint8_t line;
if (i == 5)
line = 0x0;
else
line = font[(c*5)+i];
for (int8_t j = 0; j<8; j++) {
if (line & 0x1) {
if (size == 1) // default size
drawPixel(x+i, y+j, color);
else { // big size
fillRect(x+(i*size), y+(j*size), size, size, color);
}
} else if (bg != color) {
if (size == 1) // default size
drawPixel(x+i, y+j, bg);
else { // big size
fillRect(x+i*size, y+j*size, size, size, bg);
}
}
line >>= 1;
}
}
}
And finally drawPixel, called by drawChar (though I sincerely doubt the problem goes this deep):
void drawPixel(int16_t x, int16_t y, uint16_t color) {
if ((x < 0) || (x >= width()) || (y < 0) || (y >= height()))
return;
// check rotation, move pixel around if necessary
switch (getRotation()) {
case 1:
swap(x, y);
x = WIDTH - x - 1;
break;
case 2:
x = WIDTH - x - 1;
y = HEIGHT - y - 1;
break;
case 3:
swap(x, y);
y = HEIGHT - y - 1;
break;
}
// x is which column
if (color == WHITE)
buffer[x+(y/8)*SSD1306_LCDWIDTH] |= _BV((y%8));
else
buffer[x+(y/8)*SSD1306_LCDWIDTH] &= ~_BV((y%8));
}
The result of all this is that the display shows a number that increments by twice the length of my delay. So for instance the delay here is 500ms, so it updates every 1 second. Rather than going
1, 2, 3, 4, 5...
as it should, it goes
1, 3, 5, 7, 9...
Does anyone have any advice to offer? I'm sure that it is some stupid simple problem in my initial loop, but I just can't see it right now.
I am using Atmel Studio 6 to program an xmega32a4u. The library functions shown are part of the Adafruit graphics library for the SSD1306 128x32 OLED that I ported into Atmel Studio.
Thanks so much for the help!
UPDATE: Although my code did have some problems, the real issue was actually with the way the OLED was addressed. Apparently Adafruit forgot to set the correct page address in their libraries for the display. As the controller on the display can support a 128x64 as well as a 128x32 display, the "end" address for the display must be set correctly so that the controller knows which parts of the display RAM to access. That function was missing. Because of how the display writes the data ram and because it didn't "know" that the display was only 32 pixels tall, every other frame sent to the display was actually being written to the "bottom" part of the display ram (i.e. the part that WOULD appear if the display was 128x64, twice as tall). So now everything works great!
A big thanks to unwind, if not for his suggestion about the display timing getting me thinking about that side of the issue, it might have taken me a long time to figure out the problem.
You have a buffer overrun.
This:
char c[1]={0}; // Create variable to hold character
sprintf(c,"%d", i);
is not allocating enough room in the string buffer c to hold a single-digit string. Remember that strings in C are 0-terminated, so a 1-digit string requires 2 characters. Since your loop goes to 100, you will eventually write 3 + 1 characters to the buffer, overwriting even more. Not sure how you imagined this to work.
It's likely that sprintf() is overwriting your loop index variable, although anything could happen since you're hitting undefined behavior.
Change those two lines to:
char c[8];
sprintf(c, "%d, i);
or, if you have it, use snprintf():
snprintf(c, sizeof c, "%d", i);
to get protection against buffer overrun.
If you just want the least significant digit of i, do something like this:
snprintf(c, sizeof c, "%d", i % 10);
This uses the modulo (% in C) operator to compute the remainder when dividing by 10, which is the "ones" digit.
UPDATE After reading your comments, I'm inclined to believe that your problem is one of timing, maybe the display contents aren't refreshed when you expect them to be, so that you only see every second "frame" you build. You should be able to use a debugger to pretty easily see that you do indeed build and display each numeric value, by breaking after the sprintf() in the root loop.
UPDATE 2: Just since it bothered me, your writeText() function can be simplified quite a lot, the comparison of a character against NULL is weird (NULL is a pointer, not a character) and pointless since you've already checked with strlen():
void writeText(const unsigned char *string, int16_t x, int16_t y, uint16_t color,
uint16_t bgcolor, uint8_t size)
{
while(*string != '\0')
{
drawChar(x + (i * 6 * size), y, *string, color, bgcolor, size);
++string;
}
}
Also note the const; functions that take pointers to data that the functions only read should always be declared const.
I use a struct of bit fields to access each colour channel in a pixel, the problem is that quite often I have code that applies in the same way to each channel, but because I cannot just iterate over the members of a struct in C I end up having 3 copies of the same code for each member, or more inconveniently have to use switch-case statements.
I figured it would be more elegant if I could use a macro so that I can access a member by providing a number, ideally a macro that would make .CHAN(i) become either .r, .g or .b depending on whether the integer variable i contains a 0, 1 or 2. Except I have no idea how one would make such a macro or even if that's possible.
A detail but each member is something like 12 bits, not 8 as one might expect, so I cannot just turn it into an array or have a union with a pointer. Also X-Macros won't do as I often need to do many things to each channel before doing the same to another channel, in other words the for loop for going through each member can contain a lot more than just one line.
EDIT: Here's some code, first the struct:
typedef struct
{
uint32_t b:12;
uint32_t g:12;
uint32_t r:12;
uint32_t a:12;
} lrgb_t;
Now an example of what my problem looks like in code:
for (ic=0; ic<3; ic++)
{
for (i=0; i<curvecount; i++)
{
curve[i].p0.x = (double) i;
curve[i].p3.x = (double) i+1.;
switch (ic) // this is what I'm trying to eliminate
{
case 0:
curve[i].p0.y = pancol[i].r / 4095.;
curve[i].p3.y = pancol[i+1].r / 4095.;
break;
case 1:
curve[i].p0.y = pancol[i].g / 4095.;
curve[i].p3.y = pancol[i+1].g / 4095.;
break;
case 2:
curve[i].p0.y = pancol[i].b / 4095.;
curve[i].p3.y = pancol[i+1].b / 4095.;
break;
}
// Ideally this would be replaced by something like this, CHAN() being an hypothetical macro
// curve[i].p0.y = pancol[i].CHAN(ic) / 4095.;
// curve[i].p3.y = pancol[i+1].CHAN(ic) / 4095.;
}
... // more stuff that ultimately results in a bunch of pixels being written, channel after channel
}
as pointed out in the comments, this doesn't really address the OP's problem because the members on his struct are bitfields that wouldn't align with an array. I'll keep the answer here though, in hopes it can still be useful to someone.
I think a union is what you want.
You can write your struct such as
union
{
struct
{
float r;
float g;
float b;
}rgb;
float channel[3];
} color;
This way the struct will be in the same place in memory as the float[3], and you can effectively access the same members as either a struct member or as an element in the array.
You might have to look up the exact syntax, but you get the idea.
One possibility might be to wrap the repeated code into a function, and then call it for each of the channels:
typedef struct {
int r:12;
int g:12;
int b:12;
} Pixel;
int inc(int val) {
return val + 1;
}
int main(void) {
Pixel p = {0, 0, 0};
p.r = inc(p.r);
p.g = inc(p.g);
p.b = inc(p.b);
return 0;
}
After reading the code that you added I made some changes to my suggested macro
#define CHAN(ic) \
(ic == 1) ? curve[i].p0.y = pancol[i].r / 4095; curve[i].p3.y = pancol[i+1].r / 4095; : \
(ic == 2) ? curve[i].p0.y = pancol[i].g / 4095; curve[i].p3.y = pancol[i+1].g / 4095; : \
curve[i].p0.y = pancol[i].b / 4095; curve[i].p3.y = pancol[i+1].b / 4095;
The macro CHAN(ic) will evaluate 'ic' in order to decided which member to manipulate. If 'ic' is 1 then the member '.r' will be manipulated if 'ic' is 2 then '.g' will be manipulated, and if 'ic' is neither 1 or 2 then '.b' will be manipulated because of this assumption you must make sure that 'ic' is properly set otherwise you could screw with the value of panco[i].b and pancol[i+1].b . You code should look something like the following but you will most likely need to tweak the macro a bit let me know if you have any questions.
//#define CHAN(ic) here
for (ic=0; ic<3; ic++)
{
for (i=0; i<curvecount; i++)
{
curve[i].p0.x = (double) i;
curve[i].p3.x = (double) i+1.;
CHAN(ic)
}
... // more stuff that ultimately results in a bunch of pixels being written, channel after channel
}
Also please note that my macro will do exactly the same thing as your switch case. The only difference is that it is defined in a macro the point I am trying to make is that the difference between the switch case and the macro is purely visual.
I have a problem with a series of functions. I have an array of 'return values' (i compute them through matrices) from a single function sys which depends on a integer variable, lets say, j, and I want to return them according to this j , i mean, if i want the equation number j, for example, i just write sys(j)
For this, i used a for loop but i don't know if it's well defined, because when i run my code, i don't get the right values.
Is there a better way to have an array of functions and call them in a easy way? That would make easier to work with a function in a Runge Kutta method to solve a diff equation.
I let this part of the code here: (c is just the j integer i used to explain before)
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int N=3;
double s=10.;
//float r=28.;
double b=8.0/3.0;
/ * Define functions * /
double sys(int c,double r,double y[])
{
int l,m,n,p=0;
double tmp;
double t[3][3]={0};
double j[3][3]={{-s,s,0},{r-y[2],-1,-y[0]},{y[1],y[0],-b}}; //Jacobiano
double id[3][3] = { {y[3],y[6],y[9]} , {y[4],y[7],y[10]} , {y[5],y[8],y[11]} };
double flat[N*(N+1)];
// Multiplication of matrices J * Y
for(l=0;l<N;l++)
{
for(m=0;m<N;m++)
{
for(n=0;n<N;n++)
{
t[l][m] += j[l][n] * id[n][m];
}
}
}
// Transpose the matrix (J * Y) -> () t
for(l=0;l<N;l++)
{
for(m=l+1;m<N;m++)
{
tmp = t[l][m];
t[l][m] = t[m][l];
t[m][l] = tmp;
}
}
// We flatten the array to be left in one array
for(l=0;l<N;l++)
{
for(m=0;m<N;m++)
{
flat[p+N] = t[l][m];
}
}
flat[0] = s*(y[1]-y[0]);
flat[1] = y[0]*(r-y[2])-y[1];
flat[2] = y[0]*y[1]-b*y[2];
for(l=0;l<(N*(N+1));l++)
{
if(c==l)
{
return flat[c];
}
}
}
EDIT ----------------------------------------------------------------
Ok, this is the part of the code where i use the function
int main(){
output = fopen("lyapcoef.dat","w");
int j,k;
int N2 = N*N;
int NN = N*(N+1);
double r;
double rmax = 29;
double t = 0;
double dt = 0.05;
double tf = 50;
double z[NN]; // Temporary matrix for RK4
double k1[N2],k2[N2],k3[N2],k4[N2];
double y[NN]; // Matrix for all variables
/* Initial conditions */
double u[N];
double phi[N][N];
double phiu[N];
double norm;
double lyap;
//Here we integrate the system using Runge-Kutta of fourth order
for(r=28;r<rmax;r++){
y[0]=19;
y[1]=20;
y[2]=50;
for(j=N;j<NN;j++) y[j]=0;
for(j=N;j<NN;j=j+3) y[j]=1; // Identity matrix for y from 3 to 11
while(t<tf){
/* RK4 step 1 */
for(j=0;j<NN;j++){
k1[j] = sys(j,r,y)*dt;
z[j] = y[j] + k1[j]*0.5;
}
/* RK4 step 2 */
for(j=0;j<NN;j++){
k2[j] = sys(j,r,z)*dt;
z[j] = y[j] + k2[j]*0.5;
}
/* RK4 step 3 */
for(j=0;j<NN;j++){
k3[j] = sys(j,r,z)*dt;
z[j] = y[j] + k3[j];
}
/* RK4 step 4 */
for(j=0;j<NN;j++){
k4[j] = sys(j,r,z)*dt;
}
/* Updating y matrix with new values */
for(j=0;j<NN;j++){
y[j] += (k1[j]/6.0 + k2[j]/3.0 + k3[j]/3.0 + k4[j]/6.0);
}
printf("%lf %lf %lf \n",y[0],y[1],y[2]);
t += dt;
}
Since you're actually computing all these values at the same time, what you really want is for the function to return them all together. The easiest way to do this is to pass in a pointer to an array, into which the function will write the values. Or perhaps two arrays; it looks to me as if the output of your function is (conceptually) a 3x3 matrix together with a length-3 vector.
So the declaration of sys would look something like this:
void sys(double v[3], double JYt[3][3], double r, const double y[12]);
where v would end up containing the first three elements of your flat and JYt would contain the rest. (More informative names are probably possible.)
Incidentally, the for loop at the end of your code is exactly equivalent to just saying return flat[c]; except that if c happens not to be >=0 and <N*(N+1) then control will just fall off the end of your function, which in practice means that it will return some random number that almost certainly isn't what you want.
Your function sys() does an O(N3) calculation to multiply two matrices, then does a couple of O(N2) operations, and finally selects a single number to return. Then it is called the next time and goes through most of the same processing. It feels a tad wasteful unless (even if?) the matrices are really small.
The final loop in the function is a little odd, too:
for(l=0;l<(N*(N+1));l++)
{
if(c==l)
{
return flat[c];
}
}
Isn't that more simply written as:
return flat[c];
Or, perhaps:
if (c < N * (N+1))
return flat[c];
else
...do something on disastrous error other than fall off the end of the
...function without returning a value as the code currently does...
I don't see where you are selecting an algorithm by the value of j. If that's what you're trying to describe, in C you can have an array of pointers to functions; you could use a numerical index to choose a function from the array, but you can also pass a pointer-to-a-function to another function that will call it.
That said: Judging from your code, you should keep it simple. If you want to use a number to control which code gets executed, just use an if or switch statement.
switch (c) {
case 0:
/* Algorithm 0 */
break;
case 1:
/* Algorithm 1 */
etc.
void* memorycopy (void *des, const void *src, size_t count)
{
size_t n = (count + 7) / 8;
char* destination = (char *) des;
char* source = (char *) src;
switch (count % 8)
{
case 0: do{ *destination++ = *source++;
case 7: *destination++ = *source++;
case 6: *destination++ = *source++;
case 5: *destination++ = *source++;
case 4: *destination++ = *source++;
case 3: *destination++ = *source++;
case 2: *destination++ = *source++;
case 1: *destination++ = *source++;
} while (--n > 0);
}
return des;
}
void tworegistervarswap (int *x, int *y)
{
if (x != y)
{
*x = *x ^ *y;
*y = *x ^ *y;
*x = *x ^ *y;
}
}
int bigintegeraverage (int x, int y)
{
return (x & y) + ((x ^ y) >> 1);
}
The horror with switch is Duff's device (a creatively unrolled loop, but otherwise it just copies source to destination).
tworegisterswap swaps the values pointed to by x and y using bitwise XOR.
bigintegeraverage is a sneaky way to get the average of two integers in a potentially non-portable manner but without the possibility of overflow. (See Aggregate Magic Algorithms for details.)
It's called Duff's device. It uses switch statement to implement state machines in C. In this particular case it performs one branch per 8 iterations of memcopy.
Read this for additional info: http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html
The second function swaps two integers.
The third just computes the average of two numbers, because sum can be written as
a + b = (a ^ b) + ((a & b) << 1); // the AND here represents carry
and by using ((a ^ b) >> 1) + (a & b) instead of (a + b >> 1) we can avoid possible overflows.
See Tom Duff on Duff's Device:
The point of the device is to express general loop unrolling directly in C. People who have posted saying `just use memcpy' have missed the point, as have those who have criticized it using various machine-dependent memcpy implementations as support. In fact, the example in the message is not implementable as memcpy, nor is any computer likely to have an memcpy-like idiom that implements it. more
That is actually 3 functions.
memorycopy()... well, someone else can figure that one out :P
tworegistervarswap() seems to take pointers to two ints, and if they don't already match, then it does some bit fiddling with them (XOR). It swaps the values without using a temporary variable.
bigintegeraverage() takes two ints and returns the mean (thanks Steve Jessop) based on them, using bit fiddling as well (AND, XOR and RIGHT SHIFT).