Reduce the time of 2 dimension join in C

Reduce the time of 2 dimension join in C - c

Recently I wrote a program in C. During code execution, data calculation is bottleneck. As following:
The data structure is:
typedef struct tuple_t{
int oid;
int min_x;
int min_y;
int max_x;
int max_y;
}tuple_t
the code is
for (i = 0; i < Qry->num_tuples; i++) {
tuple_t Qi = Qry->tuples[i];
for (j = 0; j < Obj->num_tuples; j++) {
tuple_t Oj = Obj->tuples[j];
int test_top_bit = (Oj.min_x - Qi.min_x) | (Qi.max_x - Oj.min_x)
| (Oj.min_y - Qi.min_y) | (Qi.max_y - Oj.min_y);
test_top_bit >= 0 ? matches++ : 0;
}
}
The code is uesd for testing whether a point is in a rectangle in 2 dimension.
The Qry->num_tuples and Obj->num_tuple is 5 million. I run the test, the time is 887 millionseconds.
And I test the clasue
if(Oj.min_x == Qi.min_x)
count++;
the time is only 3 millionseconds. So the major time is spent on the clause:
int test_top_bit = (Oj.min_x - Qi.min_x) | (Qi.max_x - Oj.min_x)
| (Oj.min_y - Qi.min_y) | (Qi.max_y - Oj.min_y);
test_top_bit >= 0 ? matches++ : 0;
I used another join algorithms, but the time is still very long.
Is there anyways to improve the performance of the testing?Could SSE of SIMD be using ?

Looking at this line I see a performance problem:
tuple_t Oj = Obj->tuples[j];
You copy this struct 25 trillion times for no reason except cleaner code.
Try using a pointer instead.
tuple_t* pOj = &Obj->tuples[j];
You can also avoid branch:
matches += ( (Oj.min_x - Qi.min_x) | (Qi.max_x - Oj.min_x) |
(Oj.min_y - Qi.min_y) | (Qi.max_y - Oj.min_y) ) >=0;

Related

How can i get the Big O Notations in this while loop?

The computational cost will only consider how many times c = c+1; is executed.
I want to represent the Big O notation to use n.
count = 0; index = 0; c = 0;
while (index <= n) {
count = count + 1;
index = index + count;
c = c + 1;
}
I think if the "iteration of count" is k and "iteration of index" is n, then k(k+1)/2 = n.
So, I think O(root(n)) is the answer.
Is that right solution about this question?

Is that right solution about this question?
This is easy to test. The value of c when your while loop has finished will be the number of times the loop has run (and, thus, the number of times the c = c + 1; statement is executed). So, let us examine the values of c, for various n, and see how they differ from the posited O(√n) complexity:
#include <stdio.h>
#include <math.h>
int main()
{
printf(" c root(n) ratio\n"); // rubric
for (int i = 1; i < 10; ++i) {
int n = 10000000 * i;
int count = 0;
int index = 0;
int c = 0;
while (index < n) {
count = count + 1;
index = index + count;
c = c + 1;
}
double d = sqrt(n);
printf("%5d %8.3lf %8.5lf\n", c, d, c / d);
}
return 0;
}
Output:
c root(n) ratio
4472 3162.278 1.41417
6325 4472.136 1.41431
7746 5477.226 1.41422
8944 6324.555 1.41417
10000 7071.068 1.41421
10954 7745.967 1.41416
11832 8366.600 1.41419
12649 8944.272 1.41420
13416 9486.833 1.41417
We can see that, even though there are some 'rounding' errors, the last column appears reasonably constant (and, as it happens, an approximation to √2, which will generally improve as n becomes larger) – thus, as we ignore constant coefficients in Big-O notation, the complexity is, as you predicted, O(√n).

Let's first see how index changes for each loop iteration:
index = 0 + 1 = 1
index = 0 + 1 + 2 = 3
index = 0 + 1 + 2 + 3 = 6
...
index = 0 + 1 + ... + i-1 + i = O(i^2)
Then we need to figure out how many times the loop runs, which is equivalent of isolating i in the equation:
i^2 = n =>
i = sqrt(n)
So your algorithm runs in O(sqrt(n)) which also can be written as O(n^0.5).

Issue when using SolveWithGuess (Eigen 3.2.3)

I would like to use the SolveWithGuess() function to solve a system of linear equations, starting from a good approximation. However, when I tried to do a test with an initial guess having small perturbations (1.e-9*i; 0 <= i <= 20) compared to the true solution, I received the following error values during the conjugate gradient iterations:
0 1.922e-09
1 3.694e-09
2 7.101e-09
3 1.365e-08
4 2.623e-08
5 5.043e-08
6 9.692e-08
7 1.863e-07
8 3.581e-07
9 6.882e-07
10 1.323e-06
Could you please what could be the problem? My test code is the following:
#include <iostream>
#include <iomanip>
#include <stdio.h>
#include <Eigen/Eigen>
using namespace Eigen;
using namespace std;
void solve()
{
int n = 20;
typedef SparseMatrix<double, ColMajor> SM;
typedef Matrix<double, -1, 1> DV;
SM a(n,n);
DV b(n), x(n);
for (int i = 0; i < n; i++)
{
b[i] = double(i);
for (int j = 0; j < n; j++) a.insert(i, j) = 0.0;
a.coeffRef(i, n - i - 1) = 1.0;
}
for (int i = 0; i < n; i++) x[i] = double(n - i - 1) + 1.e-9 * double(i);
ConjugateGradient<SM> cg;
cg.setMaxIterations(1);
cg.compute(a);
for (int it = 0; it < 100; it++)
{
cg.compute(a);
x = cg.solveWithGuess(b,x);
cout << it << " " << scientific << setw(10) << setprecision(3) << cg.error() << endl;
}
}

There are actually two problems: Your matrix is not positive definite, i.e., you should better use any of the other iterative solvers. But more importantly, if you set the maximum number of iterations to 1 and call solveWithGuess iteratively you actually do something like a gradient descent (i.e., it does not keep the previous search direction), which happens to behave terrible with your matrix.
If you are actually interested what happens "inside" an iterative solver, you need to insert debug-code into the corresponding _solve_impl method (or re-implement the solver accordingly)

C - How to copy [600][400] array into [4][4] array, then randomize element positions?

I'm trying to make a function that shuffles an image as shown below:
Its argument takes three 600x400 RGB arrays to create the pixel colors. I have been trying to brainstorm this for so many hours but I'm so confused about methods to do it. Here's an idea I've tried working out, but I got overwhelmed and stumped with:
Copy each RGB array (R[][], G[][], and B[][] separately which combined makes a colored image) into their respective temporary arrays. Split the temporary arrays into a 4x4. Each element would contain its own 2D array with a block of the original image. Then using the random library I can assign the elements to new locations within the 4x4. I have no idea how to do this without making 42 arrays (16 arrays per color in the 4x4, but 42 arrays for R, G, and B). I would appreciate any advice or Here is the code I currently have, but I paused (or possibly abandoned) working on:
void Shuffle(unsigned char R[WIDTH][HEIGHT], unsigned char G[WIDTH][HEIGHT], unsigned char B[WIDTH][HEIGHT]){
// Initialize 150x100 inner shuffle arrays. These arrays are chunks of the original image
int shuffArrR[150][100] = {0};
int shuffArrG[150][100] = {0};
int shuffArrB[150][100] = {0};
int row = 0, col = 0;
/*
BOUNDARY INFO FOR 4x4 ARRAY:
C1: C2: C3: C4: hBound# (row):
--------------------> 1
R1: | | | | |
--------------------> 2
R2: | | | | |
--------------------> 3
R3: | | | | |
--------------------> 4
R4: | | | | |
--------------------> 5
| | | | |
v v v v v
vBound# (col): 1 2 3 4 5
vBound: hBound:
#: col: #: row:
1 0 1 0
2 150 2 100
3 300 3 200
4 450 4 300
5 600 5 400
*/
// Define boundaries
int const vBound1 = 0, vBound2 = 150, vBound3 = 300, vBound4 = 450;
int const hBound1 = 0, hBound2 = 100, hBound3 = 200, hBound4 = 300;
for(row; row < HEIGHT; row++){
for(col; col < WIDTH; col++){
// Copy RGB arrays to shuffle arrays
shuffArrR[col][row] = R[col][row];
shuffArrG[col][row] = G[col][row];
shuffArrB[col][row] = B[col][row];
// Define 16 blocks in 4x4 array ------------------
// If in R1
if(row >= hBound1 && row <= hBound2){
// And in C1
if(col >= vBound1 && col <= vBound2){
// ** I stopped here after I realized how many arrays I'd have to make to account for every element in the 4x4 **
}
}
}
}
}

Use a better data structure
Right now, you are using multi-dimensional arrays, which has certain upsides and downsides lightly touched on elsewhere on SO. Because what you are doing is image processing, performance may be critical for you, and dereferencing multi-dimensional arrays is not exactly optimal for a variety of reasons (i.e. you're likelier to lose performance due to non-sequential reads).
There are a couple of ways to both improve performance, while also making your life easier:
Interleaved one-dimensional array
Meaning, you should use a single unsigned char img[WIDTH * HEIGHT * COLORS] array. This has the benefit of making your code also easier to maintain, as you can then handle RGB, B&W and RGBA images with a change to the constant COLORS. To access a given color of a single pixel, you could have img[y * width * COLORS + x * COLORS + color]. You could also write a macro to help with that, e.g.
#define INDEX_XYZ(x,y,color) ((y) * WIDTH * COLORS + (x) * COLORS + (color))
To further improve on the usability of the function, considering passing it the size of each dimension, along with the number of colors. For example, you could change the signature to be...
void Shuffle(unsigned char image[], int height, int width, int colors);
Which would allow you to use the same function on images of any size (as long as both dimensions are divisible by four) and any color. You might also want to pass an argument indicating the number of subdivisions, so you could have a 3-by-3 segmentation or 8-by-8 segmentation if you wanted, and without having to change the function or repeat the code.
Split the image into segments
One way to do this would be to create the arrays for the segments...
unsigned char segments[SEG_HORI * SEG_VERT][WIDTH / SEG_HORI * HEIGHT / SEG_VERT * COLOR];
Note the multidimensionality - it is fine here, as it is desirable for us to have a number of separate segment to store the data in.
After which we copy data from the original:
// Calculate the height/width for the segments; you could do this as a macro too.
int seg_height = HEIGHT / SEG_VERT;
int seg_width = WIDTH / SEG_HORI;
// Iterate through the rows in the picture
for (int y = 0; y < HEIGHT; y++)
{
// Obtain the Y-coordinate of the segment.
int segy = y / seg_height;
// Iterate through the columns in the picture
for (int x = 0; x < WIDTH; x++)
{
// Calculate the X-coordinate of the segment.
int segx = x / seg_width,
// Then calculate its index, using the X and Y coordinates.
seg = segy * SEG_HORI + segx,
// Then, calculate the source index (from the image).
src_idx = y * WIDTH * COLORS + x * COLORS,
// Then, map the coordinates to the segment; notice that we take
// modulos on the coordinates to get them to correctly map.
dst_idx = y % seg_height * seg_width * COLORS + x % seg_width * COLORS;
// Then copy the colors. You could also use memcpy(),
// but the below should be more educational.
for (int c = 0; c < COLORS; c++)
segments[seg][dst_idx + c] = img[src_idx + c];
}
}
Now, the image has been copied into the segments, and you can reorder them as you wish, as the "segments" are simply pointers. For example, the below would swap the top-left and the bottom-right segments.
unsigned char seg_temp[] = segments[0];
segments[0] = segments[15];
segments[15] = seg_temp;
Finally, to finalize the process and merge the segments back together, you need to redo the process above in reverse; it should be quite trivial to do, so I'll leave it for you as an exercise.
Final Notes
If you haven't already, you should familiarize yourself with the malloc() and free() functions, as well as memset() and memcpy(). They should prove very useful in the future, and would also improve the performance here as then you could copy everything into a destination array (along with the shuffle) in n operations, instead of modifying the original in 2n.
Disclaimer 1: I haven't ran any of the code through a compiler. No guarantees that it will work out of the box.
Disclaimer 2: I also do not claim the code to be well-optimized. Have to leave something for you to do.

Label each block with an ID, 0, 1, 2, ...15 .
-----------------
| 12| 13| 14| 15|
-----------------
| 8 | 9 | 10| 11|
-----------------
| 4 | 5 | 6 | 7 |
-----------------
| 0 | 1 | 2 | 3 |
-----------------
Put all ID in an array, then shuffle the array. shuffle like this. Then traversal the array and swap content of each block.
int arr[16] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
arr_shuffle(arr, 16);
int i;
for (i = 0; i < 16; i++) {
swap_block(i, arr[i]);
}
Now the problem will be how to swap two block. Let's say we have block A and block B. Both size should be 100(height) * 150(width). Then think A is an array like A[100][150], and B is B[100][150]. swap this array will be like below.
for (i = 0; i < 100; i++) {
for (j = 0; j < 150; j++) {
swap(A[i][j], B[i][j];
}
}
The final step should be convert A[i][j] and B[i][j] to the real element in array R/G/B. This can be done simply by math.
void get_real_id(int block_id, int x, int y, int *real_x, int *real_y)
{
int row, col;
row = block_id / 4; // convert block id to row number
col = block_id % 4; // convert block id to col number
// find BLOCK[y][x] in array R, which should be R[real_y][real_x]
*real_x = (col * (WIDTH/4)) + x;
*real_y = (row * (HEIGHT/4)) + y;
}
The sample code in below will work for array R. The define of R is R[HEIGHT][WEIGHT], not R[WEIGHT][HEIGHT] (This define should work too but I can't think with it).
int R[HEIGHT][WIDTH];
int arr_shuffle(int *arr, int len)
{
size_t i;
for (i = 0; i < len - 1; i++)
{
size_t j = i + rand() / (RAND_MAX / (len - i) + 1);
int t = arr[j];
arr[j] = arr[i];
arr[i] = t;
}
}
void get_real_id(int block_id, int x, int y, int *real_x, int *real_y)
{
int row, col;
row = block_id / 4;
col = block_id % 4;
*real_x = (col * (WIDTH/4)) + x;
*real_y = (row * (HEIGHT/4)) + y;
}
void swap_block(int src, int dst)
{
int w_len = WIDTH / 4; // should be 150
int h_len = HEIGHT / 4; // should be 100
int i, j;
for (i = 0; i < h_len; i++) {
for (j = 0; j < w_len; j++) {
int real_src_x;
int real_src_y;
int real_dst_x;
int real_dst_y;
get_real_id(src, j, i, &real_src_x, &real_src_y);
get_real_id(dst, j, i, &real_dst_x, &real_dst_y);
// swap two point.
int r = R[real_src_y][real_src_x];
R[real_src_y][real_src_x] = R[real_dst_y][real_dst_x];
R[real_dst_y][real_dst_x] = r;
}
}
}
int Shuffle()
{
int i;
int arr[16] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
arr_shuffle(arr, 16);
for (i = 0; i < 16; i++) {
int src_block_id = i;
int dst_block_id = arr[i];
swap_block(src_block_id, dst_block_id);
}
}
I should mention there is chance that after the Shuffle nothing changes.

Unexpected Value Output When Not Using Visual Studio

I've been working on a program for my Algorithm Analysis class where I have to solve the Knapsack problem with Brute Force, greedy, dynamic, and branch and bound strategies. Everything works perfectly when I run it in Visual Studio 2012, but if I compile with gcc and run it on the command line, I get a different result:
Visual Studio:
+-------------------------------------------------------------------------------+
| Number of | Processing time in seconds / Maximum benefit value |
| +---------------+---------------+---------------+---------------+
| items | Brute force | Greedy | D.P. | B. & B. |
+---------------+---------------+---------------+---------------+---------------+
| 10 + 0 / 1290 + 0 / 1328 + 0 / 1290 + 0 / 1290 |
+---------------+---------------+---------------+---------------+---------------+
| 20 + 0 / 3286 + 0 / 3295 + 0 / 3200 + 0 / 3286 |
+---------------+---------------+---------------+---------------+---------------+
cmd:
+-------------------------------------------------------------------------------+
| Number of | Processing time in seconds / Maximum benefit value |
| +---------------+---------------+---------------+---------------+
| items | Brute force | Greedy | D.P. | B. & B. |
+---------------+---------------+---------------+---------------+---------------+
| 10 + 0 / 1290 + 0 / 1328 + 0 / 1599229779+ 0 / 1290 |
+---------------+---------------+---------------+---------------+---------------+
| 20 + 0 / 3286 + 0 / 3295 + 0 / 3200 + 0 / 3286 |
+---------------+---------------+---------------+---------------+---------------+
The same number always shows up, "1599229779." Notice that the output is only messed up the first time the Dynamic algorithm is run.
Here is my code:
typedef struct{
short value; //This is the value of the item
short weight; //This is the weight of the item
float ratio; //This is the ratio of value/weight
} itemType;
typedef struct{
time_t startingTime;
time_t endingTime;
int maxValue;
} result;
result solveWithDynamic(itemType items[], int itemsLength, int maxCapacity){
result answer;
int rowSize = 2;
int colSize = maxCapacity + 1;
int i, j; //used in loops
int otherColumn, thisColumn;
answer.startingTime = time(NULL);
int **table = (int**)malloc((sizeof *table) * rowSize);//[2][(MAX_ITEMS*WEIGHT_MULTIPLIER)];
for(i = 0; i < rowSize; i ++)
table[i] = (int*)malloc((sizeof *table[i]) * colSize);
table[0][0] = 0;
table[1][0] = 0;
for(i = 1; i < maxCapacity; i++) table[1][i] = 0;
for(i = 0; i < itemsLength; i++){
thisColumn = i%2;
otherColumn = (i+1)%2; //this is always the other column
for(j = 1; j < maxCapacity + 1; j++){
if(items[i].weight <= j){
if(items[i].value + table[otherColumn][j-items[i].weight] > table[otherColumn][j])
table[thisColumn][j] = items[i].value + table[otherColumn][j-items[i].weight];
else
table[thisColumn][j] = table[otherColumn][j];
} else {
table[thisColumn][j] = table[thisColumn][j-1];
}//end if/else
}//end for
}//end for
answer.maxValue = table[thisColumn][maxCapacity];
answer.endingTime = time(NULL);
for(i = 0; i < rowSize; i ++)
free(table[i]);
free(table);
return answer;
}//end solveWithDynamic
Just a bit of explanation. I was having trouble with the memory consumption of this algorithm because I have to run it for a set of 10,000 items. I realized that I didn't need to store the whole table, because I only ever looked at the previous column. I actually figured out that you only need to store the current row and x+1 additional values, where x is the weight of the current itemType. It brought the memory required from (itemsLength+1) * (maxCapacity+1) elements to 2*(maxCapacity+1) and possibly (maxCapacity+1) + (x+1) (although I don't need to optimize it that much).
Also, I used printf("%d", answer.maxValue); in this function, and it still came out as "1599229779." Can anyone help me figure out what is going on? Thanks.

Can't be sure that that is what causes it, but
for(i = 1; i < maxCapacity; i++) table[1][i] = 0;
you leave table[1][maxCapacity] uninitialised, but then potentially use it:
for(j = 1; j < maxCapacity + 1; j++){
if(items[i].weight <= j){
if(items[i].value + table[otherColumn][j-items[i].weight] > table[otherColumn][j])
table[thisColumn][j] = items[i].value + table[otherColumn][j-items[i].weight];
else
table[thisColumn][j] = table[otherColumn][j];
} else {
table[thisColumn][j] = table[thisColumn][j-1];
}//end if/else
}//end for
If that is always zero with Visual Studio, but nonzero with gcc, that could explain the difference.

Optimize Bilinear Resize Algorithm in C

Can anyone spot any way to improve the speed in the next Bilinear resizing Algorithm?
I need to improve Speed as this is critical, keeping good image quality. Is expected to be used in mobile devices with low speed CPUs.
The algorithm is used mainly for up-scale resizing. Any other faster Bilinear algorithm also would be appreciated. Thanks
void resize(int* input, int* output, int sourceWidth, int sourceHeight, int targetWidth, int targetHeight)
{
int a, b, c, d, x, y, index;
float x_ratio = ((float)(sourceWidth - 1)) / targetWidth;
float y_ratio = ((float)(sourceHeight - 1)) / targetHeight;
float x_diff, y_diff, blue, red, green ;
int offset = 0 ;
for (int i = 0; i < targetHeight; i++)
{
for (int j = 0; j < targetWidth; j++)
{
x = (int)(x_ratio * j) ;
y = (int)(y_ratio * i) ;
x_diff = (x_ratio * j) - x ;
y_diff = (y_ratio * i) - y ;
index = (y * sourceWidth + x) ;
a = input[index] ;
b = input[index + 1] ;
c = input[index + sourceWidth] ;
d = input[index + sourceWidth + 1] ;
// blue element
blue = (a&0xff)*(1-x_diff)*(1-y_diff) + (b&0xff)*(x_diff)*(1-y_diff) +
(c&0xff)*(y_diff)*(1-x_diff) + (d&0xff)*(x_diff*y_diff);
// green element
green = ((a>>8)&0xff)*(1-x_diff)*(1-y_diff) + ((b>>8)&0xff)*(x_diff)*(1-y_diff) +
((c>>8)&0xff)*(y_diff)*(1-x_diff) + ((d>>8)&0xff)*(x_diff*y_diff);
// red element
red = ((a>>16)&0xff)*(1-x_diff)*(1-y_diff) + ((b>>16)&0xff)*(x_diff)*(1-y_diff) +
((c>>16)&0xff)*(y_diff)*(1-x_diff) + ((d>>16)&0xff)*(x_diff*y_diff);
output [offset++] =
0x000000ff | // alpha
((((int)red) << 24)&0xff0000) |
((((int)green) << 16)&0xff00) |
((((int)blue) << 8)&0xff00);
}
}
}

Off the the top of my head:
Stop using floating-point, unless you're certain your target CPU has it in hardware with good performance.
Make sure memory accesses are cache-optimized, i.e. clumped together.
Use the fastest data types possible. Sometimes this means smallest, sometimes it means "most native, requiring least overhead".
Investigate if signed/unsigned for integer operations have performance costs on your platform.
Investigate if look-up tables rather than computations gain you anything (but these can blow the caches, so be careful).
And, of course, do lots of profiling and measurements.

In-Line Cache and Lookup Tables
Cache your computations in your algorithm.
Avoid duplicate computations (like (1-y_diff) or (x_ratio * j))
Go through all the lines of your algorithm, and try to identify patterns of repetitions. Extract these to local variables. And possibly extract to functions, if they are short enough to be inlined, to make things more readable.
Use a lookup-table
It's quite likely that, if you can spare some memory, you can implement a "store" for your RGB values and simply "fetch" them based on the inputs that produced them. Maybe you don't need to store all of them, but you could experiment and see if some come back often. Alternatively, you could "fudge" your colors and thus end up with less values to store for more lookup inputs.
If you know the boundaries for you inputs, you can calculate the complete domain space and figure out what makes sense to cache. For instance, if you can't cache the whole R, G, B values, maybe you can at least pre-compute the shiftings ((b>>16) and so forth...) that are most likely deterministic in your case).
Use the Right Data Types for Performance
If you can avoid double and float variables, use int. On most architectures, int would be test faster type for computations because of the memory model. You can still achieve decent precision by simply shifting your units (ie use 1026 as int instead of 1.026 as double or float). It's quite likely that this trick would be enough for you.

x = (int)(x_ratio * j) ;
y = (int)(y_ratio * i) ;
x_diff = (x_ratio * j) - x ;
y_diff = (y_ratio * i) - y ;
index = (y * sourceWidth + x) ;
Could surely use some optimization: you were using x_ration * j-1 just a few cycles earlier, so all you really need here is x+=x_ratio

My random guess (use a profiler instead of letting people guess!):
The compiler has to generate that works when input and output overlap which means it has to do generate loads of redundant stores and loads. Add restrict to the input and output parameters to remove that safety feature.
You could also try using a=b; and c=d; instead of loading them again.

here is my version, steal some ideas. My C-fu is quite weak, so some lines are pseudocodes, but you can fix them.
void resize(int* input, int* output,
int sourceWidth, int sourceHeight,
int targetWidth, int targetHeight
) {
// Let's create some lookup tables!
// you can move them into 2-dimensional arrays to
// group together values used at the same time to help processor cache
int sx[0..targetWidth ]; // target->source X lookup
int sy[0..targetHeight]; // target->source Y lookup
int mx[0..targetWidth ]; // left pixel's multiplier
int my[0..targetHeight]; // bottom pixel's multiplier
// we don't have to calc indexes every time, find out when
bool reloadPixels[0..targetWidth ];
bool shiftPixels[0..targetWidth ];
int shiftReloadPixels[0..targetWidth ]; // can be combined if necessary
int v; // temporary value
for (int j = 0; j < targetWidth; j++){
// (8bit + targetBits + sourceBits) should be < max int
v = 256 * j * (sourceWidth-1) / (targetWidth-1);
sx[j] = v / 256;
mx[j] = v % 256;
reloadPixels[j] = j ? ( sx[j-1] != sx[j] ? 1 : 0)
: 1; // always load first pixel
// if no reload -> then no shift too
shiftPixels[j] = j ? ( sx[j-1]+1 = sx[j] ? 2 : 0)
: 0; // nothing to shift at first pixel
shiftReloadPixels[j] = reloadPixels[i] | shiftPixels[j];
}
for (int i = 0; i < targetHeight; i++){
v = 256 * i * (sourceHeight-1) / (targetHeight-1);
sy[i] = v / 256;
my[i] = v % 256;
}
int shiftReload;
int srcIndex;
int srcRowIndex;
int offset = 0;
int lm, rm, tm, bm; // left / right / top / bottom multipliers
int a, b, c, d;
for (int i = 0; i < targetHeight; i++){
srcRowIndex = sy[ i ] * sourceWidth;
tm = my[i];
bm = 255 - tm;
for (int j = 0; j < targetWidth; j++){
// too much ifs can be too slow, measure.
// always true for first pixel in a row
if( shiftReload = shiftReloadPixels[ j ] ){
srcIndex = srcRowIndex + sx[j];
if( shiftReload & 2 ){
a = b;
c = d;
}else{
a = input[ srcIndex ];
c = input[ srcIndex + sourceWidth ];
}
b = input[ srcIndex + 1 ];
d = input[ srcIndex + 1 + sourceWidth ];
}
lm = mx[j];
rm = 255 - lm;
// WTF?
// Input AA RR GG BB
// Output RR GG BB AA
if( j ){
leftOutput = rightOutput ^ 0xFFFFFF00;
}else{
leftOutput =
// blue element
((( ( (a&0xFF)*tm
+ (c&0xFF)*bm )*lm
) & 0xFF0000 ) >> 8)
// green element
| ((( ( ((a>>8)&0xFF)*tm
+ ((c>>8)&0xFF)*bm )*lm
) & 0xFF0000 )) // no need to shift
// red element
| ((( ( ((a>>16)&0xFF)*tm
+ ((c>>16)&0xFF)*bm )*lm
) & 0xFF0000 ) << 8 )
;
}
rightOutput =
// blue element
((( ( (b&0xFF)*tm
+ (d&0xFF)*bm )*lm
) & 0xFF0000 ) >> 8)
// green element
| ((( ( ((b>>8)&0xFF)*tm
+ ((d>>8)&0xFF)*bm )*lm
) & 0xFF0000 )) // no need to shift
// red element
| ((( ( ((b>>16)&0xFF)*tm
+ ((d>>16)&0xFF)*bm )*lm
) & 0xFF0000 ) << 8 )
;
output[offset++] =
// alpha
0x000000ff
| leftOutput
| rightOutput
;
}
}
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Reduce the time of 2 dimension join in C - c

Related

How can i get the Big O Notations in this while loop?

Issue when using SolveWithGuess (Eigen 3.2.3)

C - How to copy [600][400] array into [4][4] array, then randomize element positions?

Unexpected Value Output When Not Using Visual Studio

Optimize Bilinear Resize Algorithm in C

Categories

Resources