Peculiar problem regarding convolution in PPM format - c

After debugging, trying different image softwares (xdg, gimp) I persist to have a bug which throws me off completely.
Problem is about convolution in PPM format, for images different in aspect ratio, I'm using 1500x1000px image, where mask of {0,0,0, 0,1,0, 0,0,0} works just fine (it's just copying image), however for mask where first or last row is different than 0 eg. {0,1,0, 0,0,0, 0,0,0} image is moved by 1/3 of its size rightwards. I find it peculiar, because as far as I know, I do not have an overflow or any pointer arithmetic that might cause this problem.
I've narrowed it down to the kernel of convolution. Afaik I do not have any problems saving, reading image, after running i_convolution it just moves image by predefined value?.
void i_convolution(unsigned int **in, unsigned int ***out,
int y_max, int x_max, int kernel_size)
{
int kernel_sum = 0;
for(int i = 0; i < kernel_size; i++)
{
for(int j = 0; j < kernel_size; j++)
{
kernel_sum += kernel[i * kernel_size + j];
}
}
printf("kernel sum = %d\n", kernel_sum);
for (int i = 1; i < y_max - 1; i++)
{
for (int j = 1; j < x_max - 1; j++)
{
int r = 0;
int g = 0;
int b = 0;
for (int y_conv = -1; y_conv <= 1; y_conv++)
{
for (int x_conv = -1; x_conv <= 1; x_conv++)
{
int y_index = i + y_conv;
int x_index = j + x_conv;
unsigned char rval = (unsigned char)(in[y_index][x_index] & 0xff);
unsigned char gval = (unsigned char)((in[y_index][x_index] & 0xff00) >> 8);
unsigned char bval = (unsigned char)((in[y_index][x_index] & 0xff0000) >> 16);
int kernel_val = kernel[(y_conv + 1)*kernel_size + (x_conv + 1)];
r += (int)(rval * kernel_val);
g += (int)(gval * kernel_val);
b += (int)(bval * kernel_val);
}
}
r /= kernel_sum;//median filtration
g /= kernel_sum;//median filtration
b /= kernel_sum;//median filtration
// b = abs(b);
if (r > 255) r = 255;
else if(r < 0) r = 0;
if (g > 255) g = 255;
else if(g < 0) g = 0;
if (b > 255) b = 255;
else if(b < 0) b = 0;
unsigned int val;
val = 0;
val |= b & 0xff;
val <<= 8;
val |= g & 0xff;
val <<= 8;
val |= r & 0xff;
(*out)[i][j] = val;
}
}
}
let's take kernel {0, 1, 0, 0, 0, 0,
result are like this, with left being original, right after convolution
https://i.imgur.com/rzXKjUY.png
I will be thankful for any help.
Best regards.

I mark it as solved, because there was a problem with me misinterpreting PPM format height and width, which caused this behaviour, swapping y with x (and allocating memory as such) solves it!

Related

Radix Sort Float

I am trying to sort floats with radix. My current algorithm works with unsigned. For example, if I enter values 12, 100, 1 my sorted values are 1, 12, and 100. However, when I use a function to convert floats to ints back to floats after calling the radix sort, my values remain unsorted. They print as they were entered by the user.
I am unsure how to modify my current function to be able to sort floats with radix.
void rs(unsigned int *a, int c) {
int i;
int m = a[0];
int bt = 0;
unsigned int *b = malloc(0 * sizeof(int));
for (i = 0; i < c; i++) {
if (a[i] > m)
m = a[i];
}
while((m>>bt) > 0){
int buck[2] = { 0 };
for (i = 0; i < c; i++) {
buck[(a[i]>>bt)&1]++;
}
for (i = 1; i < 2; i++) {
buck[i] += buck[i-1];
}
for (i = c-1; i >= 0; i--) {
b[--buck[(a[i]>>bt)&1]] = a[i];
}
for (i = 0; i < c; i++) {
a[i] = b[i];
}
bt++;
}
free(b);
}
The function I am using to transform floats to ints to floats is: Radix Sort for Floats
void rfloat(float* arr, size_t size) {
assert(sizeof(unsigned) == sizeof(float) && sizeof(float) == 4);
unsigned* d = malloc(size * sizeof(unsigned));
for (size_t i = 0; i < size; i++) {
// Interpret float as 32-bit unsigned.
d[i] = *(unsigned*) &(arr[i]);
// Flip all except top if top bit is set.
d[i] ^= (((unsigned) (((int) d[i]) >> 31)) >> 1);
// Flip top bit.
d[i] ^= (1u << 31);
}
rs(d, size);
// Inverse transform.
for (size_t i = 0; i < size; i++) {
d[i] ^= (1u << 31);
d[i] ^= (((unsigned) (((int) d[i]) >> 31)) >> 1);
arr[i] = *(float*) &(d[i]);
}
free(d);
}
There's multiple issues.
You use int all over the place where you should be using unsigned (for values) or size_t (for sizes/indices).
You allocate 0 bytes.
(m >> bt) > 0 doesn't work as a stop condition, shifting bits equal or greater than the width is not specified.
After transforming the data types to unsigned the loop boundaries don't work anymore.
I took the liberty of fixing the above and choosing some better variable names:
#include <limits.h>
void rs(unsigned int *a, size_t c) {
size_t i;
unsigned bit = 0;
unsigned *b = malloc(c * sizeof(unsigned));
unsigned m = a[0]; // Max element.
for (i = 0; i < c; i++) {
if (a[i] > m) m = a[i];
}
while (bit < CHAR_BIT*sizeof(m) && (m >> bit)) {
size_t bucket_len[2] = { 0, 0 };
for (i = 0; i < c; i++) bucket_len[(a[i] >> bit) & 1]++;
size_t bucket_end[2] = {bucket_len[0], bucket_len[0] + bucket_len[1]};
for (i = c; i-- > 0; ) {
size_t j = --bucket_end[(a[i] >> bit) & 1];
b[j] = a[i];
}
for (i = 0; i < c; i++) a[i] = b[i];
bit++;
}
free(b);
}

256-bit integer to string [duplicate]

I'm trying to convert a 128-bit unsigned integer stored as an array of 4 unsigned ints to the decimal string representation in C:
unsigned int src[] = { 0x12345678, 0x90abcdef, 0xfedcba90, 0x8765421 };
printf("%s", some_func(src)); // gives "53072739890371098123344"
(The input and output examples above are completely fictional; I have no idea what that input would produce.)
If I was going to hex, binary or octal, this would be a simple matter of masks and bit shifts to peel of the least significant characters. However, it seems to me that I need to do base-10 division. Unfortunately, I can't remember how to do that across multiple ints, and the system I'm using doesn't support data types larger than 32-bits, so using a 128-bit type is not possible. Using a different language is also out, and I'd rather avoid a big number library just for this one operation.
Division is not necessary:
#include <string.h>
#include <stdio.h>
typedef unsigned long uint32;
/* N[0] - contains least significant bits, N[3] - most significant */
char* Bin128ToDec(const uint32 N[4])
{
// log10(x) = log2(x) / log2(10) ~= log2(x) / 3.322
static char s[128 / 3 + 1 + 1];
uint32 n[4];
char* p = s;
int i;
memset(s, '0', sizeof(s) - 1);
s[sizeof(s) - 1] = '\0';
memcpy(n, N, sizeof(n));
for (i = 0; i < 128; i++)
{
int j, carry;
carry = (n[3] >= 0x80000000);
// Shift n[] left, doubling it
n[3] = ((n[3] << 1) & 0xFFFFFFFF) + (n[2] >= 0x80000000);
n[2] = ((n[2] << 1) & 0xFFFFFFFF) + (n[1] >= 0x80000000);
n[1] = ((n[1] << 1) & 0xFFFFFFFF) + (n[0] >= 0x80000000);
n[0] = ((n[0] << 1) & 0xFFFFFFFF);
// Add s[] to itself in decimal, doubling it
for (j = sizeof(s) - 2; j >= 0; j--)
{
s[j] += s[j] - '0' + carry;
carry = (s[j] > '9');
if (carry)
{
s[j] -= 10;
}
}
}
while ((p[0] == '0') && (p < &s[sizeof(s) - 2]))
{
p++;
}
return p;
}
int main(void)
{
static const uint32 testData[][4] =
{
{ 0, 0, 0, 0 },
{ 1048576, 0, 0, 0 },
{ 0xFFFFFFFF, 0, 0, 0 },
{ 0, 1, 0, 0 },
{ 0x12345678, 0x90abcdef, 0xfedcba90, 0x8765421 }
};
printf("%s\n", Bin128ToDec(testData[0]));
printf("%s\n", Bin128ToDec(testData[1]));
printf("%s\n", Bin128ToDec(testData[2]));
printf("%s\n", Bin128ToDec(testData[3]));
printf("%s\n", Bin128ToDec(testData[4]));
return 0;
}
Output:
0
1048576
4294967295
4294967296
11248221411398543556294285637029484152
Straightforward division base 2^32, prints decimal digits in reverse order, uses 64-bit arithmetic, complexity O(n) where n is the number of decimal digits in the representation:
#include <stdio.h>
unsigned int a [] = { 0x12345678, 0x12345678, 0x12345678, 0x12345678 };
/* 24197857161011715162171839636988778104 */
int
main ()
{
unsigned long long d, r;
do
{
r = a [0];
d = r / 10;
r = ((r - d * 10) << 32) + a [1];
a [0] = d;
d = r / 10;
r = ((r - d * 10) << 32) + a [2];
a [1] = d;
d = r / 10;
r = ((r - d * 10) << 32) + a [3];
a [2] = d;
d = r / 10;
r = r - d * 10;
a [3] = d;
printf ("%d\n", (unsigned int) r);
}
while (a[0] || a[1] || a[2] || a[3]);
return 0;
}
EDIT: Corrected the loop so it displays a 0 if the array a contains only zeros.
Also, the array is read left to right, a[0] is most-significant, a[3] is least significant digits.
A slow but simple approach is to just printing digits from most significant to least significant using subtraction. Basically you need a function for checking if x >= y and another for computing x -= y when that is the case.
Then you can start counting how many times you can subtract 10^38 (and this will be most significant digit), then how many times you can subtract 10^37 ... down to how many times you can subtract 1.
The following is a full implementation of this approach:
#include <stdio.h>
typedef unsigned ui128[4];
int ge128(ui128 a, ui128 b)
{
int i = 3;
while (i >= 0 && a[i] == b[i])
--i;
return i < 0 ? 1 : a[i] >= b[i];
}
void sub128(ui128 a, ui128 b)
{
int i = 0;
int borrow = 0;
while (i < 4)
{
int next_borrow = (borrow && a[i] <= b[i]) || (!borrow && a[i] < b[i]);
a[i] -= b[i] + borrow;
borrow = next_borrow;
i += 1;
}
}
ui128 deci128[] = {{1u,0u,0u,0u},
{10u,0u,0u,0u},
{100u,0u,0u,0u},
{1000u,0u,0u,0u},
{10000u,0u,0u,0u},
{100000u,0u,0u,0u},
{1000000u,0u,0u,0u},
{10000000u,0u,0u,0u},
{100000000u,0u,0u,0u},
{1000000000u,0u,0u,0u},
{1410065408u,2u,0u,0u},
{1215752192u,23u,0u,0u},
{3567587328u,232u,0u,0u},
{1316134912u,2328u,0u,0u},
{276447232u,23283u,0u,0u},
{2764472320u,232830u,0u,0u},
{1874919424u,2328306u,0u,0u},
{1569325056u,23283064u,0u,0u},
{2808348672u,232830643u,0u,0u},
{2313682944u,2328306436u,0u,0u},
{1661992960u,1808227885u,5u,0u},
{3735027712u,902409669u,54u,0u},
{2990538752u,434162106u,542u,0u},
{4135583744u,46653770u,5421u,0u},
{2701131776u,466537709u,54210u,0u},
{1241513984u,370409800u,542101u,0u},
{3825205248u,3704098002u,5421010u,0u},
{3892314112u,2681241660u,54210108u,0u},
{268435456u,1042612833u,542101086u,0u},
{2684354560u,1836193738u,1126043566u,1u},
{1073741824u,1182068202u,2670501072u,12u},
{2147483648u,3230747430u,935206946u,126u},
{0u,2242703233u,762134875u,1262u},
{0u,952195850u,3326381459u,12621u},
{0u,932023908u,3199043520u,126217u},
{0u,730304488u,1925664130u,1262177u},
{0u,3008077584u,2076772117u,12621774u},
{0u,16004768u,3587851993u,126217744u},
{0u,160047680u,1518781562u,1262177448u}};
void print128(ui128 x)
{
int i = 38;
int z = 0;
while (i >= 0)
{
int c = 0;
while (ge128(x, deci128[i]))
{
c++; sub128(x, deci128[i]);
}
if (i==0 || z || c > 0)
{
z = 1; putchar('0' + c);
}
--i;
}
}
int main(int argc, const char *argv[])
{
ui128 test = { 0x12345678, 0x90abcdef, 0xfedcba90, 0x8765421 };
print128(test);
return 0;
}
That number in the problem text in decimal becomes
11248221411398543556294285637029484152
and Python agrees this is the correct value (this of course doesn't mean the code is correct!!! ;-) )
Same thing, but with 32-bit integer arithmetic:
#include <stdio.h>
unsigned short a [] = {
0x0876, 0x5421,
0xfedc, 0xba90,
0x90ab, 0xcdef,
0x1234, 0x5678
};
int
main ()
{
unsigned int d, r;
do
{
r = a [0];
d = r / 10;
r = ((r - d * 10) << 16) + a [1];
a [0] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [2];
a [1] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [3];
a [2] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [4];
a [3] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [5];
a [4] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [6];
a [5] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [7];
a [6] = d;
d = r / 10;
r = r - d * 10;
a [7] = d;
printf ("%d\n", r);
}
while (a[0] || a[1] || a[2] || a[3] || a [4] || a [5] || a[6] || a[7]);
return 0;
}
You actually don't need to implement long division. You need to implement multiplication by a power of two, and addition. You have four uint_32. First convert each of them to a string. Multiply them by (2^32)^3, (2^32)^2, (2^32)^1, and (2^32)^0 respectively, then add them together. You don't need to do the base conversion, you just need to handle putting the four pieces together. You'll obviously need to make sure the strings can handle a number up to UINT_32_MAX*(2^32)^3.
Supposing you have a fast 32-bit multiplication and division the result can be computed 4 digits at a time by implementing a bigint division/modulo 10000 and then using (s)printf for output of digit groups.
This approach is also trivial to extend to higher (or even variable) precision...
#include <stdio.h>
typedef unsigned long bigint[4];
void print_bigint(bigint src)
{
unsigned long int x[8]; // expanded version (16 bit per element)
int result[12]; // 4 digits per element
int done = 0; // did we finish?
int i = 0; // digit group counter
/* expand to 16-bit per element */
x[0] = src[0] & 65535;
x[1] = src[0] >> 16;
x[2] = src[1] & 65535;
x[3] = src[1] >> 16;
x[4] = src[2] & 65535;
x[5] = src[2] >> 16;
x[6] = src[3] & 65535;
x[7] = src[3] >> 16;
while (!done)
{
done = 1;
{
unsigned long carry = 0;
int j;
for (j=7; j>=0; j--)
{
unsigned long d = (carry << 16) + x[j];
x[j] = d / 10000;
carry = d - x[j] * 10000;
if (x[j]) done = 0;
}
result[i++] = carry;
}
}
printf ("%i", result[--i]);
while (i > 0)
{
printf("%04i", result[--i]);
}
}
int main(int argc, const char *argv[])
{
bigint tests[] = { { 0, 0, 0, 0 },
{ 0xFFFFFFFFUL, 0, 0, 0 },
{ 0, 1, 0, 0 },
{ 0x12345678UL, 0x90abcdefUL, 0xfedcba90UL, 0x8765421UL } };
{
int i;
for (i=0; i<4; i++)
{
print_bigint(tests[i]);
printf("\n");
}
}
return 0;
}
#Alexey Frunze's method is easy but it's very slow. You should use #chill's 32-bit integer method above. Another easy method without any multiplication or division is double dabble. This may work slower than chill's algorithm but much faster than Alexey's one. After running you'll have a packed BCD of the decimal number
On github is an open source project (c++) which provides a class for a datatype uint265_t and uint128_t.
https://github.com/calccrypto/uint256_t
No, I' not affiliated with that project, but I was using it for such a purpose, but I guess it could be usefull for others as well.

Radix Sort Optimization

I was trying to optimize the Radix Sort code, because I felt there was room for it as traditional codes in books and on web seem a direct copy of one another and also they work very slow as they take an arbitrary number such as 10 for modulo operation. I have optimized the code as far as I could go, maybe I might have missed some optimization techniques. In that case please enlighten me.
Motivation for optimization:
http://codercorner.com/RadixSortRevisited.htm
http://stereopsis.com/radix.html
I was unable to implement all the optimizations in the articles, mostly it was beyond my skills and understanding and lack of sufficient time, if you can feel free to implement them.
EDIT 4
This Java version of Radix Sort calculates all histograms in 1 read and does not need to fill array Z with zeros after every LSB sort along with the usual ability to skip sorting and jump to next LSB sorting if all previous LSB's are same. As usual this is only for 32-bit integers but a 64-bit version can be created from it.
protected static int[] DSC(int A[])// Sorts in descending order
{
int tmp[] = new int[A.length] ;
int Z[] = new int[1024] ;
int i, Jump, Jump2, Jump3, Jump4, swap[] ;
Jump = A[0] & 255 ;
Z[Jump] = 1 ;
Jump2 = ((A[0] >> 8) & 255) + 256 ;
Z[Jump2] = 1 ;
Jump3 = ((A[0] >> 16) & 255) + 512 ;
Z[Jump3] = 1 ;
Jump4 = (A[0] >> 24) + 768 ;
Z[Jump4] = 1 ;
// Histograms creation
for (i = 1 ; i < A.length; ++i)
{
++Z[A[i] & 255] ;
++Z[((A[i] >> 8) & 255) + 256] ;
++Z[((A[i] >> 16) & 255) + 512] ;
++Z[(A[i] >> 24) + 768] ;
}
// 1st LSB Byte Sort
if( Z[Jump] != A.length )
{
Z[0] = A.length - Z[0];
for (i = 1; i < 256; ++i)
{
Z[i] = Z[i - 1] - Z[i];
}
for (i = 0; i < A.length; ++i)
{
tmp[Z[A[i] & 255]++] = A[i];
}
swap = A ; A = tmp ; tmp = swap ;
}
// 2nd LSB Byte Sort
if( Z[Jump2] != A.length )
{
Z[256] = A.length - Z[256];
for (i = 257; i < 512; ++i)
{
Z[i] = Z[i - 1] - Z[i];
}
for (i = 0; i < A.length; ++i)
{
tmp[Z[((A[i] >> 8) & 255) + 256]++] = A[i];
}
swap = A ; A = tmp ; tmp = swap ;
}
// 3rd LSB Byte Sort
if( Z[Jump3] != A.length )
{
Z[512] = A.length - Z[512];
for (i = 513; i < 768; ++i)
{
Z[i] = Z[i - 1] - Z[i];
}
for (i = 0; i < A.length; ++i)
{
tmp[Z[((A[i] >> 16) & 255) + 512]++] = A[i];
}
swap = A ; A = tmp ; tmp = swap ;
}
// 4th LSB Byte Sort
if( Z[Jump4] != A.length )
{
Z[768] = A.length - Z[768];
for (i = 769; i < Z.length; ++i)
{
Z[i] = Z[i - 1] - Z[i];
}
for (i = 0; i < A.length; ++i)
{
tmp[Z[(A[i] >> 24) + 768]++] = A[i];
}
return tmp ;
}
return A ;
}
The Java version ran faster with != sign than == sign
if( Z[Jump] != A.length )
{
// lines of code
}...
but in C the below version was on average, 25% faster (with equalto sign) than its counterpart with != sign. Your hardware might react differently.
if( Z[Jump] == A.length );
else
{
// lines of code
}...
Below is the C code ( "long" on my machine is 32 bits )
long* Radix_2_ac_long(long *A, size_t N, long *Temp)// Sorts in ascending order
{
size_t Z[1024] = {0};
long *swp;
size_t i, Jump, Jump2, Jump3, Jump4;
// Sort-circuit set-up
Jump = *A & 255;
Z[Jump] = 1;
Jump2 = ((*A >> 8) & 255) + 256;
Z[Jump2] = 1;
Jump3 = ((*A >> 16) & 255) + 512;
Z[Jump3] = 1;
Jump4 = (*A >> 24) + 768;
Z[Jump4] = 1;
// Histograms creation
for(i = 1 ; i < N ; ++i)
{
++Z[*(A+i) & 255];
++Z[((*(A+i) >> 8) & 255) + 256];
++Z[((*(A+i) >> 16) & 255) + 512];
++Z[(*(A+i) >> 24) + 768];
}
// 1st LSB byte sort
if( Z[Jump] == N );
else
{
for( i = 1 ; i < 256 ; ++i )
{
Z[i] = Z[i-1] + Z[i];
}
for( i = N-1 ; i < N ; --i )
{
*(--Z[*(A+i) & 255] + Temp) = *(A+i);
}
swp = A;
A = Temp;
Temp = swp;
}
// 2nd LSB byte sort
if( Z[Jump2] == N );
else
{
for( i = 257 ; i < 512 ; ++i )
{
Z[i] = Z[i-1] + Z[i];
}
for( i = N-1 ; i < N ; --i )
{
*(--Z[((*(A+i) >> 8) & 255) + 256] + Temp) = *(A+i);
}
swp = A;
A = Temp;
Temp = swp;
}
// 3rd LSB byte sort
if( Z[Jump3] == N );
else
{
for( i = 513 ; i < 768 ; ++i )
{
Z[i] = Z[i-1] + Z[i];
}
for( i = N-1 ; i < N ; --i )
{
*(--Z[((*(A+i) >> 16) & 255) + 512] + Temp) = *(A+i);
}
swp = A;
A = Temp;
Temp = swp;
}
// 4th LSB byte sort
if( Z[Jump4] == N );
else
{
for( i = 769 ; i < 1024 ; ++i )
{
Z[i] = Z[i-1] + Z[i];
}
for( i = N-1 ; i < N ; --i )
{
*(--Z[(*(A+i) >> 24) + 768] + Temp) = *(A+i);
}
return Temp;
}
return A;
}
EDIT 5
The sort now handles negative numbers too. Only some minor/negligible tweaks to the code did it. It runs a little slower as a result but the effect is not significant. Coded in C, below ( "long" on my system is 32 bits )
long* Radix_Sort(long *A, size_t N, long *Temp)
{
size_t Z[1024] = {0};
long *swp;
size_t Jump, Jump2, Jump3, Jump4;
long i;
// Sort-circuit set-up
Jump = *A & 255;
Z[Jump] = 1;
Jump2 = ((*A >> 8) & 255) + 256;
Z[Jump2] = 1;
Jump3 = ((*A >> 16) & 255) + 512;
Z[Jump3] = 1;
Jump4 = ((*A >> 24) & 255) + 768;
Z[Jump4] = 1;
// Histograms creation
for(i = 1 ; i < N ; ++i)
{
++Z[*(A+i) & 255];
++Z[((*(A+i) >> 8) & 255) + 256];
++Z[((*(A+i) >> 16) & 255) + 512];
++Z[((*(A+i) >> 24) & 255) + 768];
}
// 1st LSB byte sort
if( Z[Jump] == N );
else
{
for( i = 1 ; i < 256 ; ++i )
{
Z[i] = Z[i-1] + Z[i];
}
for( i = N-1 ; i >= 0 ; --i )
{
*(--Z[*(A+i) & 255] + Temp) = *(A+i);
}
swp = A;
A = Temp;
Temp = swp;
}
// 2nd LSB byte sort
if( Z[Jump2] == N );
else
{
for( i = 257 ; i < 512 ; ++i )
{
Z[i] = Z[i-1] + Z[i];
}
for( i = N-1 ; i >= 0 ; --i )
{
*(--Z[((*(A+i) >> 8) & 255) + 256] + Temp) = *(A+i);
}
swp = A;
A = Temp;
Temp = swp;
}
// 3rd LSB byte sort
if( Z[Jump3] == N );
else
{
for( i = 513 ; i < 768 ; ++i )
{
Z[i] = Z[i-1] + Z[i];
}
for( i = N-1 ; i >= 0 ; --i )
{
*(--Z[((*(A+i) >> 16) & 255) + 512] + Temp) = *(A+i);
}
swp = A;
A = Temp;
Temp = swp;
}
// 4th LSB byte sort and negative numbers sort
if( Z[Jump4] == N );
else
{
for( i = 897 ; i < 1024 ; ++i )// -ve values frequency starts after index 895, i.e at 896 ( 896 = 768 + 128 ), goes upto 1023
{
Z[i] = Z[i-1] + Z[i];
}
Z[768] = Z[768] + Z[1023];
for( i = 769 ; i < 896 ; ++i )
{
Z[i] = Z[i-1] + Z[i];
}
for( i = N-1 ; i >= 0 ; --i )
{
*(--Z[((*(A+i) >> 24) & 255) + 768] + Temp) = *(A+i);
}
return Temp;
}
return A;
}
EDIT 6
Below is the pointer optimized version ( accesses array locations via pointers ) that takes on average, approximately 20% less time to sort than the one above. It also uses 4 separate arrays for faster address calculation ( "long" on my system is 32 bits ).
long* Radix_Sort(long *A, size_t N, long *Temp)
{
long Z1[256] ;
long Z2[256] ;
long Z3[256] ;
long Z4[256] ;
long T = 0 ;
while(T != 256)
{
*(Z1+T) = 0 ;
*(Z2+T) = 0 ;
*(Z3+T) = 0 ;
*(Z4+T) = 0 ;
++T;
}
size_t Jump, Jump2, Jump3, Jump4;
// Sort-circuit set-up
Jump = *A & 255 ;
Z1[Jump] = 1;
Jump2 = (*A >> 8) & 255 ;
Z2[Jump2] = 1;
Jump3 = (*A >> 16) & 255 ;
Z3[Jump3] = 1;
Jump4 = (*A >> 24) & 255 ;
Z4[Jump4] = 1;
// Histograms creation
long *swp = A + N;
long *i = A + 1;
for( ; i != swp ; ++i)
{
++Z1[*i & 255];
++Z2[(*i >> 8) & 255];
++Z3[(*i >> 16) & 255];
++Z4[(*i >> 24) & 255];
}
// 1st LSB byte sort
if( Z1[Jump] == N );
else
{
swp = Z1+256 ;
for( i = Z1+1 ; i != swp ; ++i )
{
*i = *(i-1) + *i;
}
swp = A-1;
for( i = A+N-1 ; i != swp ; --i )
{
*(--Z1[*i & 255] + Temp) = *i;
}
swp = A;
A = Temp;
Temp = swp;
}
// 2nd LSB byte sort
if( Z2[Jump2] == N );
else
{
swp = Z2+256 ;
for( i = Z2+1 ; i != swp ; ++i )
{
*i = *(i-1) + *i;
}
swp = A-1;
for( i = A+N-1 ; i != swp ; --i )
{
*(--Z2[(*i >> 8) & 255] + Temp) = *i;
}
swp = A;
A = Temp;
Temp = swp;
}
// 3rd LSB byte sort
if( Z3[Jump3] == N );
else
{
swp = Z3 + 256 ;
for( i = Z3+1 ; i != swp ; ++i )
{
*i = *(i-1) + *i;
}
swp = A-1;
for( i = A+N-1 ; i != swp ; --i )
{
*(--Z3[(*i >> 16) & 255] + Temp) = *i;
}
swp = A;
A = Temp;
Temp = swp;
}
// 4th LSB byte sort and negative numbers sort
if( Z4[Jump4] == N );
else
{
swp = Z4 + 256 ;
for( i = Z4+129 ; i != swp ; ++i )
{
*i = *(i-1) + *i;
}
*Z4 = *Z4 + *(Z4+255) ;
swp = Z4 + 128 ;
for( i = Z4+1 ; i != swp ; ++i )
{
*i = *(i-1) + *i;
}
swp = A - 1;
for( i = A+N-1 ; i != swp ; --i )
{
*(--Z4[(*i >> 24) & 255] + Temp) = *i;
}
return Temp;
}
return A;
}
The edit 4 version is good enough if the original and temp arrays fit in cache. If the array size is much greater than cache size, most of the overhead is due to the random order writes to the arrays. A hybrid msb/lsb radix sort can avoid this issue. For example split the array into 256 bins according to the most significant byte, then do a lsb radix sort on each of the 256 bins. The idea here is that a pair (original and temp) of bins will fit within the cache, where random order writes are not an issue (for most cache implementations).
For a 8MB cache, the goal is for each of the bins to be < 4MB in size = 1 million 32 bit integers if the integers evenly distribute into the bins. This strategy would work for array size up to 256 million 32 bit integers. For larger arrays, the msb phase could split up the array into 1024 bins, for up to 1 billion 32 bit integers. On my system, sorting 16,777,216 (2^24) 32 bit integers with a classic 8,8,8,8 lsb radix sort took 0.45 seconds, while the hybrid 8 msb : 8,8,8 lsb took 0.24 seconds.
// split array into 256 bins according to most significant byte
void RadixSort(uint32_t * a, size_t count)
{
size_t aIndex[260] = {0}; // count / array
uint32_t * b = new uint32_t [count]; // allocate temp array
size_t i;
for(i = 0; i < count; i++) // generate histogram
aIndex[1+((size_t)(a[i] >> 24))]++;
for(i = 2; i < 257; i++) // convert to indices
aIndex[i] += aIndex[i-1];
for(i = 0; i < count; i++) // sort by msb
b[aIndex[a[i]>>24]++] = a[i];
for(i = 256; i; i--) // restore aIndex
aIndex[i] = aIndex[i-1];
aIndex[0] = 0;
for(i = 0; i < 256; i++) // radix sort the 256 bins
RadixSort3(&b[aIndex[i]], &a[aIndex[i]], aIndex[i+1]-aIndex[i]);
delete[] b;
}
// sort a bin by 3 least significant bytes
void RadixSort3(uint32_t * a, uint32_t *b, size_t count)
{
size_t mIndex[3][256] = {0}; // count / matrix
size_t i,j,m,n;
uint32_t u;
if(count == 0)
return;
for(i = 0; i < count; i++){ // generate histograms
u = a[i];
for(j = 0; j < 3; j++){
mIndex[j][(size_t)(u & 0xff)]++;
u >>= 8;
}
}
for(j = 0; j < 3; j++){ // convert to indices
m = 0;
for(i = 0; i < 256; i++){
n = mIndex[j][i];
mIndex[j][i] = m;
m += n;
}
}
for(j = 0; j < 3; j++){ // radix sort
for(i = 0; i < count; i++){ // sort by current lsb
u = a[i];
m = (size_t)(u>>(j<<3))&0xff;
b[mIndex[j][m]++] = u;
}
std::swap(a, b); // swap ptrs
}
}
Example code for classic lsb radix sorts:
Example C++ lsb radix sort using 8,8,8,8 bit fields:
typedef unsigned int uint32_t;
void RadixSort(uint32_t * a, size_t count)
{
size_t mIndex[4][256] = {0}; // count / index matrix
uint32_t * b = new uint32_t [count]; // allocate temp array
size_t i,j,m,n;
uint32_t u;
for(i = 0; i < count; i++){ // generate histograms
u = a[i];
for(j = 0; j < 4; j++){
mIndex[j][(size_t)(u & 0xff)]++;
u >>= 8;
}
}
for(j = 0; j < 4; j++){ // convert to indices
m = 0;
for(i = 0; i < 256; i++){
n = mIndex[j][i];
mIndex[j][i] = m;
m += n;
}
}
for(j = 0; j < 4; j++){ // radix sort
for(i = 0; i < count; i++){ // sort by current lsb
u = a[i];
m = (size_t)(u>>(j<<3))&0xff;
b[mIndex[j][m]++] = u;
}
std::swap(a, b); // swap ptrs
}
delete[] b;
}
Example C++ code using 16,16 bit fields:
typedef unsigned int uint32_t;
uint32_t * RadixSort(uint32_t * a, size_t count)
{
size_t mIndex[2][65536] = {0}; // count / index matrix
uint32_t * b = new uint32_t [count]; // allocate temp array
size_t i,j,m,n;
uint32_t u;
for(i = 0; i < count; i++){ // generate histograms
u = a[i];
for(j = 0; j < 2; j++){
mIndex[j][(size_t)(u & 0xffff)]++;
u >>= 16;
}
}
for(j = 0; j < 2; j++){ // convert to indices
m = 0;
for(i = 0; i < 65536; i++){
n = mIndex[j][i];
mIndex[j][i] = m;
m += n;
}
}
for(j = 0; j < 2; j++){ // radix sort
for(i = 0; i < count; i++){ // sort by current lsb
u = a[i];
m = (size_t)(u>>(j<<4))&0xffff;
b[mIndex[j][m]++] = u;
}
std::swap(a, b); // swap ptrs
}
delete[] b;
return(a);
}
N & 15 , N & 31 , N & 63 .... and so on , which of these bitwise
operations takes least time?
They are same. Do not take it bad, but optimizing for speed without knowing how long things last may end up quite bad. And even when you know the timing, hardware is very complicated nowadays and quite unpredictable. You program in java, that is another layer of insanely complex system. The same code may be faster today and slower tomorrow. Your say approximately 2.232891909840167 times faster. In reality, you have measurement on one hardware and software configuration with one set of data and you can only hope the measurement is representative enough. Unfortunately, it is not always the case.
I rewrote your function. It is shorter and simpler, yet does not seem to be slower. Compilers tend to like code that is not too clever, as there are many optimizations for simple cases. The correction for negative numbers is not particulary nice, you can delete it if you do not like it. It seems to work best for 8 bits and 11 bits, probably due to cache sizes, have a look at comments of rcgldr.
EDIT
#ytoamn you are right, if all is in the first bucket the loop should continue, not break. That was a bug. To the other changes, I would rather avoid the contract you have done now. I think there are three natural contracts for sorting function. First one is sorting the original array and returning null. Second is sorting the original array and return it. The third is returning new sorted array and keeping the original array intact. I like the first one, as its behaviour is unambiguous. The way you have it now you should add big warning to the documentation, that the original array has changed and is returned from the function is some cases and in other not. Second thing I would avoid is the old C code style. You should define loop variable in the loop if you need it only there. Defining it globally injects dependency that may lead to bugs. And it has no advantages here, as properly defined loop variables would share the space in the end anyway. Compiler is well aware of the scope, you should use the smallest scope you need.
EDIT2
Feel free to comment directly under my post :-) Local variables are just addresses on the stack. You allocate memory when constructing object which is not the case here. As for the array, think about this code:
public static void Tst(int[] A) {
int[] tmp = new int[A.length];
A[0] = 6;
A = tmp; // changes what parameter A contains
A[0] = 7;
}
public static void main(String[] args) {
int[] A = new int[1];
A[0] = 5;
Tst(A);
System.out.println(A[0]); //prints 6
}
It prints 6. Number 7 is written into tmp array only. Array A in main is not affected.
protected static void ASC2(int A[], int bits) {
int[] origA = A;
int[] tmp = new int[A.length];
int[] Z = new int[1 << bits];
int mask = (1 << bits) - 1;
for (int shift = 0; shift < 32; shift += bits) {
if (shift > 0) {
Arrays.fill(Z, 0);
}
for (int i = 0; i < A.length; ++i) {
Z[(A[i] >> shift) & mask]++;
}
if (Z[0] == A.length) {
continue; // all in first bucket
}
Z[Z.length - 1] = A.length - Z[Z.length - 1];
for (int i = Z.length - 2; i >= 0; --i) {
Z[i] = Z[i + 1] - Z[i];
}
if (shift + bits > 31) { // negative numbers correction
int halfLength = Z.length / 2;
int positSum = Z[halfLength];
int negSum = A.length - positSum;
if (negSum > 0) {
for (int i = 0; i < halfLength; ++i) {
Z[i] += negSum;
}
for (int i = halfLength; i < Z.length; ++i) {
Z[i] -= positSum;
}
}
}
for (int i = 0; i < A.length; ++i) {
tmp[Z[(A[i] >> shift) & mask]++] = A[i];
}
int[] swap = A;
A = tmp;
tmp = swap;
}
if (A != origA) {
System.arraycopy(A, 0, origA, 0, A.length);
}
}
EDIT3
Loop unroll is a valid technique, improving short circuiting is really nice. But with using array lengths as constants you definitely start to be too clever. If you hard coded the base size, why not hard code it all like this:
protected static int[] DSC2(int A[])// sorts in descending order
{
int tmp[] = new int[A.length];
int Z[] = new int[256];
int sample, swap[];
// 1st LSB byte extraction
sample = A[0] & 255;
for (int i = 0; i < A.length; ++i) {
Z[A[i] & 255]++;
}
if (Z[sample] != A.length) {
Z[0] = A.length - Z[0];
for (int i = 1; i < Z.length; ++i) {
Z[i] = Z[i - 1] - Z[i];
}
for (int i = 0; i < A.length; ++i) {
tmp[Z[A[i] & 255]++] = A[i];
}
swap = A;
A = tmp;
tmp = swap;
Arrays.fill(Z, 0);
} else {
Z[sample] = 0;
}
// 2nd LSB byte extraction
sample = (A[0] >> 8) & 255;
for (int i = 0; i < A.length; ++i) {
Z[(A[i] >> 8) & 255]++;
}
if (Z[sample] != A.length) {
Z[0] = A.length - Z[0];
for (int i = 1; i < Z.length; ++i) {
Z[i] = Z[i - 1] - Z[i];
}
for (int i = 0; i < A.length; ++i) {
tmp[Z[(A[i] >> 8) & 255]++] = A[i];
}
swap = A;
A = tmp;
tmp = swap;
Arrays.fill(Z, 0);
} else {
Z[sample] = 0;
}
// 3rd LSB byte extraction
sample = (A[0] >> 16) & 255;
for (int i = 0; i < A.length; ++i) {
Z[(A[i] >> 16) & 255]++;
}
if (Z[sample] != A.length) {
Z[0] = A.length - Z[0];
for (int i = 1; i < Z.length; ++i) {
Z[i] = Z[i - 1] - Z[i];
}
for (int i = 0; i < A.length; ++i) {
tmp[Z[(A[i] >> 16) & 255]++] = A[i];
}
swap = A;
A = tmp;
tmp = swap;
Arrays.fill(Z, 0);
} else {
Z[sample] = 0;
}
// 4th LSB byte extraction
sample = (A[0] >> 24) & 255;
for (int i = 0; i < A.length; ++i) {
Z[(A[i] >> 24) & 255]++;
}
if (Z[sample] != A.length) {
Z[0] = A.length - Z[0];
for (int i = 1; i < Z.length; ++i) {
Z[i] = Z[i - 1] - Z[i];
}
for (int i = 0; i < A.length; ++i) {
tmp[Z[(A[i] >> 24) & 255]++] = A[i];
}
A = tmp;
}
return A;
}

YUV420 to RGB color conversion Error

I am converting an image in YUV420 format to RGB image in opencv but im getting an Orange colored image after conversion. I used following code to do that. Is there any problem in my code ??
int step = origImage->widthStep;
uchar *data = (uchar *)origImage->imageData;
int size = origImage->width * origImage->height;
IplImage* img1 = cvCreateImage(cvGetSize(origImage), IPL_DEPTH_8U, 3);
for (int i = 0; i<origImage->height; i++)
{
for (int j=0; j<origImage->width; j++)
{
float Y = data[i*step + j];
float U = data[ (int)(size + (i/2)*(step/2) + j/2) ];
float V = data[ (int)(size*1.25 + (i/2)*(step/2) + j/2)];
float R = Y + (int)(1.772f*V);
float G = Y - (int)(0.344f*V + 0.714f*U);
float B = Y + (int)(1.402f*U);
if (R < 0){ R = 0; } if (G < 0){ G = 0; } if (B < 0){ B = 0; }
if (R > 255 ){ R = 255; } if (G > 255) { G = 255; } if (B > 255) { B = 255; }
cvSet2D(img1, i, j,cvScalar(B,G,R));
}
}
origImage -> YUV image,
img1 -> RGB image,
http://upload.wikimedia.org/wikipedia/en/0/0d/Yuv420.svg
Is there any opencv function which can convert a pixel in YUV420 format to corresponding RGB pixel ? (not entire image)
I got answer by modifying the formula for calculating R G B values,
This code is working fine
int step = origImage->widthStep;
uchar *data = (uchar *)origImage->imageData;
int size = origImage->width * origImage->height;
IplImage* img1 = cvCreateImage(cvGetSize(origImage), IPL_DEPTH_8U, 3);
for (int i = 0; i<origImage->height; i++)
{
for (int j=0; j<origImage->width; j++)
{
float Y = data[i*step + j];
float U = data[ (int)(size + (i/2)*(step/2) + j/2) ];
float V = data[ (int)(size*1.25 + (i/2)*(step/2) + j/2)];
float R = Y + 1.402 * (V - 128);
float G = Y - 0.344 * (U - 128) - 0.714 * (V - 128);
float B = Y + 1.772 * (U - 128);
if (R < 0){ R = 0; } if (G < 0){ G = 0; } if (B < 0){ B = 0; }
if (R > 255 ){ R = 255; } if (G > 255) { G = 255; } if (B > 255) { B = 255; }
cvSet2D(img1, i, j,cvScalar(B,G,R));
}
}
the 1st problem is using the outdated c-api (it's dead & gone. please use c++ instead).
the 2nd problem is writing your own (slow and error prone) pixel loops
why not use :
cvtColor(crs,dst, CV_YUV2BGR); // or CV_YUV2BGR_I420
instead ?

How to convert a 128-bit integer to a decimal ascii string in C?

I'm trying to convert a 128-bit unsigned integer stored as an array of 4 unsigned ints to the decimal string representation in C:
unsigned int src[] = { 0x12345678, 0x90abcdef, 0xfedcba90, 0x8765421 };
printf("%s", some_func(src)); // gives "53072739890371098123344"
(The input and output examples above are completely fictional; I have no idea what that input would produce.)
If I was going to hex, binary or octal, this would be a simple matter of masks and bit shifts to peel of the least significant characters. However, it seems to me that I need to do base-10 division. Unfortunately, I can't remember how to do that across multiple ints, and the system I'm using doesn't support data types larger than 32-bits, so using a 128-bit type is not possible. Using a different language is also out, and I'd rather avoid a big number library just for this one operation.
Division is not necessary:
#include <string.h>
#include <stdio.h>
typedef unsigned long uint32;
/* N[0] - contains least significant bits, N[3] - most significant */
char* Bin128ToDec(const uint32 N[4])
{
// log10(x) = log2(x) / log2(10) ~= log2(x) / 3.322
static char s[128 / 3 + 1 + 1];
uint32 n[4];
char* p = s;
int i;
memset(s, '0', sizeof(s) - 1);
s[sizeof(s) - 1] = '\0';
memcpy(n, N, sizeof(n));
for (i = 0; i < 128; i++)
{
int j, carry;
carry = (n[3] >= 0x80000000);
// Shift n[] left, doubling it
n[3] = ((n[3] << 1) & 0xFFFFFFFF) + (n[2] >= 0x80000000);
n[2] = ((n[2] << 1) & 0xFFFFFFFF) + (n[1] >= 0x80000000);
n[1] = ((n[1] << 1) & 0xFFFFFFFF) + (n[0] >= 0x80000000);
n[0] = ((n[0] << 1) & 0xFFFFFFFF);
// Add s[] to itself in decimal, doubling it
for (j = sizeof(s) - 2; j >= 0; j--)
{
s[j] += s[j] - '0' + carry;
carry = (s[j] > '9');
if (carry)
{
s[j] -= 10;
}
}
}
while ((p[0] == '0') && (p < &s[sizeof(s) - 2]))
{
p++;
}
return p;
}
int main(void)
{
static const uint32 testData[][4] =
{
{ 0, 0, 0, 0 },
{ 1048576, 0, 0, 0 },
{ 0xFFFFFFFF, 0, 0, 0 },
{ 0, 1, 0, 0 },
{ 0x12345678, 0x90abcdef, 0xfedcba90, 0x8765421 }
};
printf("%s\n", Bin128ToDec(testData[0]));
printf("%s\n", Bin128ToDec(testData[1]));
printf("%s\n", Bin128ToDec(testData[2]));
printf("%s\n", Bin128ToDec(testData[3]));
printf("%s\n", Bin128ToDec(testData[4]));
return 0;
}
Output:
0
1048576
4294967295
4294967296
11248221411398543556294285637029484152
Straightforward division base 2^32, prints decimal digits in reverse order, uses 64-bit arithmetic, complexity O(n) where n is the number of decimal digits in the representation:
#include <stdio.h>
unsigned int a [] = { 0x12345678, 0x12345678, 0x12345678, 0x12345678 };
/* 24197857161011715162171839636988778104 */
int
main ()
{
unsigned long long d, r;
do
{
r = a [0];
d = r / 10;
r = ((r - d * 10) << 32) + a [1];
a [0] = d;
d = r / 10;
r = ((r - d * 10) << 32) + a [2];
a [1] = d;
d = r / 10;
r = ((r - d * 10) << 32) + a [3];
a [2] = d;
d = r / 10;
r = r - d * 10;
a [3] = d;
printf ("%d\n", (unsigned int) r);
}
while (a[0] || a[1] || a[2] || a[3]);
return 0;
}
EDIT: Corrected the loop so it displays a 0 if the array a contains only zeros.
Also, the array is read left to right, a[0] is most-significant, a[3] is least significant digits.
A slow but simple approach is to just printing digits from most significant to least significant using subtraction. Basically you need a function for checking if x >= y and another for computing x -= y when that is the case.
Then you can start counting how many times you can subtract 10^38 (and this will be most significant digit), then how many times you can subtract 10^37 ... down to how many times you can subtract 1.
The following is a full implementation of this approach:
#include <stdio.h>
typedef unsigned ui128[4];
int ge128(ui128 a, ui128 b)
{
int i = 3;
while (i >= 0 && a[i] == b[i])
--i;
return i < 0 ? 1 : a[i] >= b[i];
}
void sub128(ui128 a, ui128 b)
{
int i = 0;
int borrow = 0;
while (i < 4)
{
int next_borrow = (borrow && a[i] <= b[i]) || (!borrow && a[i] < b[i]);
a[i] -= b[i] + borrow;
borrow = next_borrow;
i += 1;
}
}
ui128 deci128[] = {{1u,0u,0u,0u},
{10u,0u,0u,0u},
{100u,0u,0u,0u},
{1000u,0u,0u,0u},
{10000u,0u,0u,0u},
{100000u,0u,0u,0u},
{1000000u,0u,0u,0u},
{10000000u,0u,0u,0u},
{100000000u,0u,0u,0u},
{1000000000u,0u,0u,0u},
{1410065408u,2u,0u,0u},
{1215752192u,23u,0u,0u},
{3567587328u,232u,0u,0u},
{1316134912u,2328u,0u,0u},
{276447232u,23283u,0u,0u},
{2764472320u,232830u,0u,0u},
{1874919424u,2328306u,0u,0u},
{1569325056u,23283064u,0u,0u},
{2808348672u,232830643u,0u,0u},
{2313682944u,2328306436u,0u,0u},
{1661992960u,1808227885u,5u,0u},
{3735027712u,902409669u,54u,0u},
{2990538752u,434162106u,542u,0u},
{4135583744u,46653770u,5421u,0u},
{2701131776u,466537709u,54210u,0u},
{1241513984u,370409800u,542101u,0u},
{3825205248u,3704098002u,5421010u,0u},
{3892314112u,2681241660u,54210108u,0u},
{268435456u,1042612833u,542101086u,0u},
{2684354560u,1836193738u,1126043566u,1u},
{1073741824u,1182068202u,2670501072u,12u},
{2147483648u,3230747430u,935206946u,126u},
{0u,2242703233u,762134875u,1262u},
{0u,952195850u,3326381459u,12621u},
{0u,932023908u,3199043520u,126217u},
{0u,730304488u,1925664130u,1262177u},
{0u,3008077584u,2076772117u,12621774u},
{0u,16004768u,3587851993u,126217744u},
{0u,160047680u,1518781562u,1262177448u}};
void print128(ui128 x)
{
int i = 38;
int z = 0;
while (i >= 0)
{
int c = 0;
while (ge128(x, deci128[i]))
{
c++; sub128(x, deci128[i]);
}
if (i==0 || z || c > 0)
{
z = 1; putchar('0' + c);
}
--i;
}
}
int main(int argc, const char *argv[])
{
ui128 test = { 0x12345678, 0x90abcdef, 0xfedcba90, 0x8765421 };
print128(test);
return 0;
}
That number in the problem text in decimal becomes
11248221411398543556294285637029484152
and Python agrees this is the correct value (this of course doesn't mean the code is correct!!! ;-) )
Same thing, but with 32-bit integer arithmetic:
#include <stdio.h>
unsigned short a [] = {
0x0876, 0x5421,
0xfedc, 0xba90,
0x90ab, 0xcdef,
0x1234, 0x5678
};
int
main ()
{
unsigned int d, r;
do
{
r = a [0];
d = r / 10;
r = ((r - d * 10) << 16) + a [1];
a [0] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [2];
a [1] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [3];
a [2] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [4];
a [3] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [5];
a [4] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [6];
a [5] = d;
d = r / 10;
r = ((r - d * 10) << 16) + a [7];
a [6] = d;
d = r / 10;
r = r - d * 10;
a [7] = d;
printf ("%d\n", r);
}
while (a[0] || a[1] || a[2] || a[3] || a [4] || a [5] || a[6] || a[7]);
return 0;
}
You actually don't need to implement long division. You need to implement multiplication by a power of two, and addition. You have four uint_32. First convert each of them to a string. Multiply them by (2^32)^3, (2^32)^2, (2^32)^1, and (2^32)^0 respectively, then add them together. You don't need to do the base conversion, you just need to handle putting the four pieces together. You'll obviously need to make sure the strings can handle a number up to UINT_32_MAX*(2^32)^3.
Supposing you have a fast 32-bit multiplication and division the result can be computed 4 digits at a time by implementing a bigint division/modulo 10000 and then using (s)printf for output of digit groups.
This approach is also trivial to extend to higher (or even variable) precision...
#include <stdio.h>
typedef unsigned long bigint[4];
void print_bigint(bigint src)
{
unsigned long int x[8]; // expanded version (16 bit per element)
int result[12]; // 4 digits per element
int done = 0; // did we finish?
int i = 0; // digit group counter
/* expand to 16-bit per element */
x[0] = src[0] & 65535;
x[1] = src[0] >> 16;
x[2] = src[1] & 65535;
x[3] = src[1] >> 16;
x[4] = src[2] & 65535;
x[5] = src[2] >> 16;
x[6] = src[3] & 65535;
x[7] = src[3] >> 16;
while (!done)
{
done = 1;
{
unsigned long carry = 0;
int j;
for (j=7; j>=0; j--)
{
unsigned long d = (carry << 16) + x[j];
x[j] = d / 10000;
carry = d - x[j] * 10000;
if (x[j]) done = 0;
}
result[i++] = carry;
}
}
printf ("%i", result[--i]);
while (i > 0)
{
printf("%04i", result[--i]);
}
}
int main(int argc, const char *argv[])
{
bigint tests[] = { { 0, 0, 0, 0 },
{ 0xFFFFFFFFUL, 0, 0, 0 },
{ 0, 1, 0, 0 },
{ 0x12345678UL, 0x90abcdefUL, 0xfedcba90UL, 0x8765421UL } };
{
int i;
for (i=0; i<4; i++)
{
print_bigint(tests[i]);
printf("\n");
}
}
return 0;
}
#Alexey Frunze's method is easy but it's very slow. You should use #chill's 32-bit integer method above. Another easy method without any multiplication or division is double dabble. This may work slower than chill's algorithm but much faster than Alexey's one. After running you'll have a packed BCD of the decimal number
On github is an open source project (c++) which provides a class for a datatype uint265_t and uint128_t.
https://github.com/calccrypto/uint256_t
No, I' not affiliated with that project, but I was using it for such a purpose, but I guess it could be usefull for others as well.

Resources