Interleave 4 byte ints to 8 byte int - c

I'm currently working to create a function which accepts two 4 byte unsigned integers, and returns an 8 byte unsigned long. I've tried to base my work off of the methods depicted by this research but all my attempts have been unsuccessful. The specific inputs I am working with are: 0x12345678 and 0xdeadbeef, and the result I'm looking for is 0x12de34ad56be78ef. This is my work so far:
unsigned long interleave(uint32_t x, uint32_t y){
uint64_t result = 0;
int shift = 33;
for(int i = 64; i > 0; i-=16){
shift -= 8;
//printf("%d\n", i);
//printf("%d\n", shift);
result |= (x & i) << shift;
result |= (y & i) << (shift-1);
}
}
However, this function keeps returning 0xfffffffe which is incorrect. I am printing and verifying these values using:
printf("0x%x\n", z);
and the input is initialized like so:
uint32_t x = 0x12345678;
uint32_t y = 0xdeadbeef;
Any help on this topic would be greatly appreciated, C has been a very difficult language for me, and bitwise operations even more so.

This can be done based on interleaving bits, but skipping some steps so it only interleaves bytes. Same idea: first spread out the bytes in a couple of steps, then combine them.
Here is the plan, illustrated with my amazing freehand drawing skills:
In C (not tested):
// step 1, moving the top two bytes
uint64_t a = (((uint64_t)x & 0xFFFF0000) << 16) | (x & 0xFFFF);
// step 2, moving bytes 2 and 6
a = ((a & 0x00FF000000FF0000) << 8) | (a & 0x000000FF000000FF);
// same thing with y
uint64_t b = (((uint64_t)y & 0xFFFF0000) << 16) | (y & 0xFFFF);
b = ((b & 0x00FF000000FF0000) << 8) | (b & 0x000000FF000000FF);
// merge them
uint64_t result = (a << 8) | b;
Using SSSE3 PSHUFB has been suggested, it'll work but there is an instruction that can do a byte-wise interleave in one go, punpcklbw. So all we need to really do is get the values into and out of vector registers, and that single instruction will then just care of it.
Not tested:
uint64_t interleave(uint32_t x, uint32_t y) {
__m128i xvec = _mm_cvtsi32_si128(x);
__m128i yvec = _mm_cvtsi32_si128(y);
__m128i interleaved = _mm_unpacklo_epi8(yvec, xvec);
return _mm_cvtsi128_si64(interleaved);
}

With bit-shifting and bitwise operations (endianness independent):
uint64_t interleave(uint32_t x, uint32_t y){
uint64_t result = 0;
for(uint8_t i = 0; i < 4; i ++){
result |= ((x & (0xFFull << (8*i))) << (8*(i+1)));
result |= ((y & (0xFFull << (8*i))) << (8*i));
}
return result;
}
With pointers (endianness dependent):
uint64_t interleave(uint32_t x, uint32_t y){
uint64_t result = 0;
uint8_t * x_ptr = (uint8_t *)&x;
uint8_t * y_ptr = (uint8_t *)&y;
uint8_t * r_ptr = (uint8_t *)&result;
for(uint8_t i = 0; i < 4; i++){
*(r_ptr++) = y_ptr[i];
*(r_ptr++) = x_ptr[i];
}
return result;
}
Note: this solution assumes little-endian byte order

You could do it like this:
uint64_t interleave(uint32_t x, uint32_t y)
{
uint64_t z;
unsigned char *a = (unsigned char *)&x; // 1
unsigned char *b = (unsigned char *)&y; // 1
unsigned char *c = (unsigned char *)&z;
c[0] = a[0];
c[1] = b[0];
c[2] = a[1];
c[3] = b[1];
c[4] = a[2];
c[5] = b[2];
c[6] = a[3];
c[7] = b[3];
return z;
}
Interchange a and b on the lines marked 1 depending on ordering requirement.
A version with shifts, where the LSB of y is always the LSB of the output as in your example, is:
uint64_t interleave(uint32_t x, uint32_t y)
{
return
(y & 0xFFull)
| (x & 0xFFull) << 8
| (y & 0xFF00ull) << 8
| (x & 0xFF00ull) << 16
| (y & 0xFF0000ull) << 16
| (x & 0xFF0000ull) << 24
| (y & 0xFF000000ull) << 24
| (x & 0xFF000000ull) << 32;
}
The compilers I tried don't seem to do a good job of optimizing either version so if this is a performance critical situation then maybe the inline assembly suggestion from comments is the way to go.

use union punning. Easy for the compiler to optimize.
#include <stdio.h>
#include <stdint.h>
#include <string.h>
typedef union
{
uint64_t u64;
struct
{
union
{
uint32_t a32;
uint8_t a8[4]
};
union
{
uint32_t b32;
uint8_t b8[4]
};
};
uint8_t u8[8];
}data_64;
uint64_t interleave(uint32_t a, uint32_t b)
{
data_64 in , out;
in.a32 = a;
in.b32 = b;
for(size_t index = 0; index < sizeof(a); index ++)
{
out.u8[index * 2 + 1] = in.a8[index];
out.u8[index * 2 ] = in.b8[index];
}
return out.u64;
}
int main(void)
{
printf("%llx\n", interleave(0x12345678U, 0xdeadbeefU)) ;
}

Related

Circular shift 28 bits within 4 bytes in C

I have an unsigned char *Buffer that contains 4 bytes, but only 28 of them are relevant to me.
I am looking to create a function that will do a circular shift of the 28 bits while ignoring the remaining 4 bits.
For example, I have the following within *Buffer
1111000011001100101010100000
Say I want to left circular shift by 1 bit of the 28 bits, making it
1110000110011001010101010000
I have looked around and I can't figure out how to get the shift, ignore the last 4 bits, and have the ability to shift either 1, 2, 3, or 4 bits depending on a variable set earlier in the program.
Any help with this would be smashing! Thanks in advance.
Only 1 bit at a time, but this does a 28 bit circular shift
uint32_t csl28(uint32_t value) {
uint32_t overflow_mask = 0x08000000;
uint32_t value_mask = 0x07FFFFFF;
return ((value & value_mask) << 1) | ((value & overflow_mask) >> 27);
}
uint32_t csr28(uint32_t value) {
uint32_t overflow_mask = 0x00000001;
uint32_t value_mask = 0x0FFFFFFE;
return ((value & value_mask) >> 1) | ((value & overflow_mask) << 27);
}
Another version, based on this article. This shifts an artbitrary number of bits (count) within an arbitrarily wide bit field (width). To left shift a value 5 bits in a 23 bit wide field: rotl32(value, 5, 23);
uint32_t rotl32 (uint32_t value, uint32_t count, uint32_t width) {
uint32_t value_mask = ((uint32_t)~0) >> (CHAR_BIT * sizeof(value) - width);
const uint32_t mask = (width-1);
count &= mask;
return value_mask & ((value<<count) | (value>>( (-count) & mask )));
}
uint32_t rotr32 (uint32_t value, uint32_t count, uint32_t width) {
uint32_t value_mask = ((uint32_t)~0) >> (CHAR_BIT * sizeof(value) - width);
const uint32_t mask = (width-1);
count &= mask;
return value_mask & ((value>>count) | (value<<( (-count) & mask )));
}
The above functions assume the value is stored in the low order bits of "value"
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
const char *uint32_to_binary(uint32_t x)
{
static char b[33];
b[0] = '\0';
uint32_t z;
for (z = 0x80000000; z > 0; z >>= 1)
{
strcat(b, ((x & z) == z) ? "1" : "0");
}
return b;
}
uint32_t reverse(uint32_t value)
{
return (value & 0x000000FF) << 24 | (value & 0x0000FF00) << 8 |
(value & 0x00FF0000) >> 8 | (value & 0xFF000000) >> 24;
}
int is_big_endian(void)
{
union {
uint32_t i;
char c[4];
} bint = {0x01020304};
return bint.c[0] == 1;
}
int main(int argc, char** argv) {
char b[] = { 0x98, 0x02, 0xCA, 0xF0 };
char *buffer = b;
//uint32_t num = 0x01234567;
uint32_t num = *((uint32_t *)buffer);
if (!is_big_endian()) {
num = reverse(*((uint32_t *)buffer));
}
num >>= 4;
printf("%x\n", num);
for(int i=0;i<5;i++) {
printf("%s\n", uint32_to_binary(num));
num = rotl32(num, 3, 28);
}
for(int i=0;i<5;i++) {
//printf("%08x\n", num);
printf("%s\n", uint32_to_binary(num));
num = rotr32(num, 3, 28);
}
unsigned char out[4];
memset(out, 0, sizeof(unsigned char) * 4);
num <<= 4;
if (!is_big_endian()) {
num = reverse(num);
}
*((uint32_t*)out) = num;
printf("[ ");
for (int i=0;i<4;i++) {
printf("%s0x%02x", i?", ":"", out[i] );
}
printf(" ]\n");
}
First you mask the top four most significant bits
*(buffer + 3) &= 0x0F;
Then you can perform the circular shift of the remaining 28 bits by x bits.
Note: This will work for little endian architecture(x86 Pc's and most microcontrollers)
[...] that contains 4 bytes, but only 28 of them [...]
We got it, but...
I guess that you mis-typed the second number of your example. Or you '''ignore''' 4 bits from left and right so you're actually interrested in 24 bits? Anyway:
Use same principle as in
Circular shift in c.
You need to convert your Buffer to a 32 bit arithmetic type, before. Maybe uint32_t is what you need?
Where did Buffer get his value? You may need to think about endianness.

Extract 14-bit values from an array of bytes in C

In an arbitrary-sized array of bytes in C, I want to store 14-bit numbers (0-16,383) tightly packed. In other words, in the sequence:
0000000000000100000000000001
there are two numbers that I wish to be able to arbitrarily store and retrieve into a 16-bit integer. (in this case, both of them are 1, but could be anything in the given range) If I were to have the functions uint16_t 14bitarr_get(unsigned char* arr, unsigned int index) and void 14bitarr_set(unsigned char* arr, unsigned int index, uint16_t value), how would I implement those functions?
This is not for a homework project, merely my own curiosity. I have a specific project that this would be used for, and it is the key/center of the entire project.
I do not want an array of structs that have 14-bit values in them, as that generates waste bits for every struct that is stored. I want to be able to tightly pack as many 14-bit values as I possibly can into an array of bytes. (e.g.: in a comment I made, putting as many 14-bit values into a chunk of 64 bytes is desirable, with no waste bits. the way those 64 bytes work is completely tightly packed for a specific use case, such that even a single bit of waste would take away the ability to store another 14 bit value)
Well, this is bit fiddling at its best. Doing it with an array of bytes makes it more complicated than it would be with larger elements because a single 14 bit quantity can span 3 bytes, where uint16_t or anything bigger would require no more than two. But I'll take you at your word that this is what you want (no pun intended). This code will actually work with the constant set to anything 8 or larger (but not over the size of an int; for that, additional type casts are needed). Of course the value type must be adjusted if larger than 16.
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#define W 14
uint16_t arr_get(unsigned char* arr, size_t index) {
size_t bit_index = W * index;
size_t byte_index = bit_index / 8;
unsigned bit_in_byte_index = bit_index % 8;
uint16_t result = arr[byte_index] >> bit_in_byte_index;
for (unsigned n_bits = 8 - bit_in_byte_index; n_bits < W; n_bits += 8)
result |= arr[++byte_index] << n_bits;
return result & ~(~0u << W);
}
void arr_set(unsigned char* arr, size_t index, uint16_t value) {
size_t bit_index = W * index;
size_t byte_index = bit_index / 8;
unsigned bit_in_byte_index = bit_index % 8;
arr[byte_index] &= ~(0xff << bit_in_byte_index);
arr[byte_index++] |= value << bit_in_byte_index;
unsigned n_bits = 8 - bit_in_byte_index;
value >>= n_bits;
while (n_bits < W - 8) {
arr[byte_index++] = value;
value >>= 8;
n_bits += 8;
}
arr[byte_index] &= 0xff << (W - n_bits);
arr[byte_index] |= value;
}
int main(void) {
int mod = 1 << W;
int n = 50000;
unsigned x[n];
unsigned char b[2 * n];
for (int tries = 0; tries < 10000; tries++) {
for (int i = 0; i < n; i++) {
x[i] = rand() % mod;
arr_set(b, i, x[i]);
}
for (int i = 0; i < n; i++)
if (arr_get(b, i) != x[i])
printf("Err #%d: %d should be %d\n", i, arr_get(b, i), x[i]);
}
return 0;
}
Faster versions Since you said in comments that performance is an issue: open coding the loops gives a roughly 10% speed improvement on my machine on the little test driver included in the original. This includes random number generation and testing, so perhaps the primitives are 20% faster. I'm confident that 16- or 32-bit array elements would give further improvements because byte access is expensive:
uint16_t arr_get(unsigned char* a, size_t i) {
size_t ib = 14 * i;
size_t iy = ib / 8;
switch (ib % 8) {
case 0:
return (a[iy] | (a[iy+1] << 8)) & 0x3fff;
case 2:
return ((a[iy] >> 2) | (a[iy+1] << 6)) & 0x3fff;
case 4:
return ((a[iy] >> 4) | (a[iy+1] << 4) | (a[iy+2] << 12)) & 0x3fff;
}
return ((a[iy] >> 6) | (a[iy+1] << 2) | (a[iy+2] << 10)) & 0x3fff;
}
#define M(IB) (~0u << (IB))
#define SETLO(IY, IB, V) a[IY] = (a[IY] & M(IB)) | ((V) >> (14 - (IB)))
#define SETHI(IY, IB, V) a[IY] = (a[IY] & ~M(IB)) | ((V) << (IB))
void arr_set(unsigned char* a, size_t i, uint16_t val) {
size_t ib = 14 * i;
size_t iy = ib / 8;
switch (ib % 8) {
case 0:
a[iy] = val;
SETLO(iy+1, 6, val);
return;
case 2:
SETHI(iy, 2, val);
a[iy+1] = val >> 6;
return;
case 4:
SETHI(iy, 4, val);
a[iy+1] = val >> 4;
SETLO(iy+2, 2, val);
return;
}
SETHI(iy, 6, val);
a[iy+1] = val >> 2;
SETLO(iy+2, 4, val);
}
Another variation
This is quite a bit faster yet on my machine, about 20% better than above:
uint16_t arr_get2(unsigned char* a, size_t i) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned buf = a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
return (buf >> (ib % 8)) & 0x3fff;
}
void arr_set2(unsigned char* a, size_t i, unsigned val) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned buf = a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
unsigned io = ib % 8;
buf = (buf & ~(0x3fff << io)) | (val << io);
a[iy] = buf;
a[iy+1] = buf >> 8;
a[iy+2] = buf >> 16;
}
Note that for this code to be safe you should allocate one extra byte at the end of the packed array. It always reads and writes 3 bytes even when the desired 14 bits are in the first 2.
One more variation Finally, this runs just a bit slower than the one above (again on my machine; YMMV), but you don't need the extra byte. It uses one comparison per operation:
uint16_t arr_get2(unsigned char* a, size_t i) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned io = ib % 8;
unsigned buf = ib % 8 <= 2
? a[iy] | (a[iy+1] << 8)
: a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
return (buf >> io) & 0x3fff;
}
void arr_set2(unsigned char* a, size_t i, unsigned val) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned io = ib % 8;
if (io <= 2) {
unsigned buf = a[iy] | (a[iy+1] << 8);
buf = (buf & ~(0x3fff << io)) | (val << io);
a[iy] = buf;
a[iy+1] = buf >> 8;
} else {
unsigned buf = a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
buf = (buf & ~(0x3fff << io)) | (val << io);
a[iy] = buf;
a[iy+1] = buf >> 8;
a[iy+2] = buf >> 16;
}
}
The easiest solution is to use a struct of eight bitfields:
typedef struct __attribute__((__packed__)) EightValues {
uint16_t v0 : 14,
v1 : 14,
v2 : 14,
v3 : 14,
v4 : 14,
v5 : 14,
v6 : 14,
v7 : 14;
} EightValues;
This struct has a size of 14*8 = 112 bits, which is 14 bytes (seven uint16_t). Now, all you need is to use the last three bits of the array index to select the right bitfield:
uint16_t 14bitarr_get(unsigned char* arr, unsigned int index) {
EightValues* accessPointer = (EightValues*)arr;
accessPointer += index >> 3; //select the right structure in the array
switch(index & 7) { //use the last three bits of the index to access the right bitfield
case 0: return accessPointer->v0;
case 1: return accessPointer->v1;
case 2: return accessPointer->v2;
case 3: return accessPointer->v3;
case 4: return accessPointer->v4;
case 5: return accessPointer->v5;
case 6: return accessPointer->v6;
case 7: return accessPointer->v7;
}
}
Your compiler will do the bit-fiddling for you.
The Basis for Storage Issue
The biggest issue you are facing is the fundamental question of "What is my basis for storage going to be?" You know the basics, what you have available to you is char, short, int, etc... The smallest being 8-bits. No matter how you slice your storage scheme, it will ultimately have to rest in memory in a unit of memory based on this 8 bit per byte layout.
The only optimal, no bits wasted, memory allocation would be to declare an array of char in the least common multiple of 14-bits. It is the full 112-bits in this case (7-shorts or 14-chars). This may be the best option. Here, declaring an array of 7-shorts or 14-chars, would allow the exact storage of 8 14-bit values. Of course if you have no need for 8 of them, then it wouldn't be of much use anyway as it would waste more than the 4-bits lost on a single unsigned value.
Let me know if this is something you would like to further explore. If it is, I'm happy to help with the implementation.
Bitfield Struct
The comments regarding bitfield packing or bit packing are exactly what you need to do. This can involve a structure alone or in combination with a union, or by manually right/left shifting values directly as needed.
A short example applicable to your situation (if I understood correctly you want 2 14-bit areas in memory) would be:
#include <stdio.h>
typedef struct bitarr14 {
unsigned n1 : 14,
n2 : 14;
} bitarr14;
char *binstr (unsigned long n, size_t sz);
int main (void) {
bitarr14 mybitfield;
mybitfield.n1 = 1;
mybitfield.n2 = 1;
printf ("\n mybitfield in memory : %s\n\n",
binstr (*(unsigned *)&mybitfield, 28));
return 0;
}
char *binstr (unsigned long n, size_t sz)
{
static char s[64 + 1] = {0};
char *p = s + 64;
register size_t i = 0;
for (i = 0; i < sz; i++) {
p--;
*p = (n >> i & 1) ? '1' : '0';
}
return p;
}
Output
$ ./bin/bitfield14
mybitfield in memory : 0000000000000100000000000001
Note: the dereference of mybitfield for purposes of printing the value in memory breaks strict aliasing and it is intentional just for the purpose of the output example.
The beauty, and purpose for using a struct in the manner provided is it will allow direct access to each 14-bit part of the struct directly, without having to manually shift, etc.
Update - assuming you want big endian bit packing. This is code meant for a fixed size code word. It's based on code I've used for data compression algorithms. The switch case and fixed logic helps with performance.
typedef unsigned short uint16_t;
void bit14arr_set(unsigned char* arr, unsigned int index, uint16_t value)
{
unsigned int bitofs = (index*14)%8;
arr += (index*14)/8;
switch(bitofs){
case 0: /* bit offset == 0 */
*arr++ = (unsigned char)(value >> 6);
*arr &= 0x03;
*arr |= (unsigned char)(value << 2);
break;
case 2: /* bit offset == 2 */
*arr &= 0xc0;
*arr++ |= (unsigned char)(value >> 8);
*arr = (unsigned char)(value << 0);
break;
case 4: /* bit offset == 4 */
*arr &= 0xf0;
*arr++ |= (unsigned char)(value >> 10);
*arr++ = (unsigned char)(value >> 2);
*arr &= 0x3f;
*arr |= (unsigned char)(value << 6);
break;
case 6: /* bit offset == 6 */
*arr &= 0xfc;
*arr++ |= (unsigned char)(value >> 12);
*arr++ = (unsigned char)(value >> 4);
*arr &= 0x0f;
*arr |= (unsigned char)(value << 4);
break;
}
}
uint16_t bit14arr_get(unsigned char* arr, unsigned int index)
{
unsigned int bitofs = (index*14)%8;
unsigned short value;
arr += (index*14)/8;
switch(bitofs){
case 0: /* bit offset == 0 */
value = ((unsigned int)(*arr++) ) << 6;
value |= ((unsigned int)(*arr ) ) >> 2;
break;
case 2: /* bit offset == 2 */
value = ((unsigned int)(*arr++)&0x3f) << 8;
value |= ((unsigned int)(*arr ) ) >> 0;
break;
case 4: /* bit offset == 4 */
value = ((unsigned int)(*arr++)&0x0f) << 10;
value |= ((unsigned int)(*arr++) ) << 2;
value |= ((unsigned int)(*arr ) ) >> 6;
break;
case 6: /* bit offset == 6 */
value = ((unsigned int)(*arr++)&0x03) << 12;
value |= ((unsigned int)(*arr++) ) << 4;
value |= ((unsigned int)(*arr ) ) >> 4;
break;
}
return value;
}
Here's my version (updated to fix bugs):
#define PACKWID 14 // number of bits in packed number
#define PACKMSK ((1 << PACKWID) - 1)
#ifndef ARCHBYTEALIGN
#define ARCHBYTEALIGN 1 // align to 1=bytes, 2=words
#endif
#define ARCHBITALIGN (ARCHBYTEALIGN * 8)
typedef unsigned char byte;
typedef unsigned short u16;
typedef unsigned int u32;
typedef long long s64;
typedef u16 pcknum_t; // container for packed number
typedef u32 acc_t; // working accumulator
#ifndef ARYOFF
#define ARYOFF long
#endif
#define PRT(_val) ((unsigned long) _val)
typedef unsigned ARYOFF aryoff_t; // bit offset
// packary -- access array of packed numbers
// RETURNS: old value
extern inline pcknum_t
packary(byte *ary,aryoff_t idx,int setflg,pcknum_t newval)
// ary -- byte array pointer
// idx -- index into array (packed number relative)
// setflg -- 1=set new value, 0=just get old value
// newval -- new value to set (if setflg set)
{
aryoff_t absbitoff;
aryoff_t bytoff;
aryoff_t absbitlhs;
acc_t acc;
acc_t nval;
int shf;
acc_t curmsk;
pcknum_t oldval;
// get the absolute bit number for the given array index
absbitoff = idx * PACKWID;
// get the byte offset of the lowest byte containing the number
bytoff = absbitoff / ARCHBITALIGN;
// get absolute bit offset of first containing byte
absbitlhs = bytoff * ARCHBITALIGN;
// get amount we need to shift things by:
// (1) our accumulator
// (2) values to set/get
shf = absbitoff - absbitlhs;
#ifdef MODSHOW
do {
static int modshow;
if (modshow > 50)
break;
++modshow;
printf("packary: MODSHOW idx=%ld shf=%d bytoff=%ld absbitlhs=%ld absbitoff=%ld\n",
PRT(idx),shf,PRT(bytoff),PRT(absbitlhs),PRT(absbitoff));
} while (0);
#endif
// adjust array pointer to the portion we want (guaranteed to span)
ary += bytoff * ARCHBYTEALIGN;
// fetch the number + some other bits
acc = *(acc_t *) ary;
// get the old value
oldval = (acc >> shf) & PACKMSK;
// set the new value
if (setflg) {
// get shifted mask for packed number
curmsk = PACKMSK << shf;
// remove the old value
acc &= ~curmsk;
// ensure caller doesn't pass us a bad value
nval = newval;
#if 0
nval &= PACKMSK;
#endif
nval <<= shf;
// add in the value
acc |= nval;
*(acc_t *) ary = acc;
}
return oldval;
}
pcknum_t
int_get(byte *ary,aryoff_t idx)
{
return packary(ary,idx,0,0);
}
void
int_set(byte *ary,aryoff_t idx,pcknum_t newval)
{
packary(ary,idx,1,newval);
}
Here are benchmarks:
set: 354740751 7.095 -- gene
set: 203407176 4.068 -- rcgldr
set: 298946533 5.979 -- craig
get: 268574627 5.371 -- gene
get: 166839767 3.337 -- rcgldr
get: 207764612 4.155 -- craig

Efficient test for pythagorean triples in modulo integer space

I was wondering what the most effective formula, for testing if three numbers are a pythagorean triple is.
Just as a reminder: a pythagorean triple are three integers where a²+b²=c².
I mean not the most effective formula in terms of time, but a formula that is the most efficient in terms of not causing an overflow on a specific integer(lets say 32-bit unsigned int).
I was trying a bit with rearrangements of a*a + b*b == c*c:
Let's assume a<=b<c, then the best formula I could get to is:
2b*(c-b) == (a+b-c) * (a-b+c)
with this formula can be proven, that the right side is smaller than a*c and so should be the left side, but a*c doesn't look like a huge improvement of c*c.
So my question is, if there is a better formula for this conditional that works with bigger numbers without overflowing an integer space. The execution time of the formula doesn't matter that much, besides it should be O(1).
PS: I don't know if I should post such a question here or on Mathematics SE, but to me it seems to be more about programming.
EDIT If you need to have 32bit integers all the way down then you can just modify the math to fit your requirement. To keep it simple I do the math (squaring and summing) on 16bit chunks of data and use a struct that contains 2 unsigned ints as the result.
http://ideone.com/er2TaS
#include <iostream>
using namespace std;
struct u64 {
unsigned int lo;
unsigned int hi;
bool of;
};
u64 square(unsigned int a) {
u64 result;
unsigned int alo = (a & 0xffff);
unsigned int ahi = (a >> 16);
unsigned int aalo = alo * alo;
unsigned int aami = alo * ahi;
unsigned int aahi = ahi * ahi;
unsigned int aa1 = aalo & 0xffff;
unsigned int aa2 = (aalo >> 16) + (aami & 0xffff) + (aami & 0xffff);
unsigned int aa3 = (aa2 >> 16) + (aami >> 16) + (aami >> 16) + (aahi & 0xffff);
unsigned int aa4 = (aa3 >> 16) + (aahi >> 16);
result.lo = (aa1 & 0xffff) | ((aa2 & 0xffff) << 16);
result.hi = (aa3 & 0xffff) | (aa4 << 16);
result.of = false; // 0xffffffff^2 can't overflow
return result;
}
u64 sum(u64 a, u64 b) {
u64 result;
unsigned int a1 = a.lo & 0xffff;
unsigned int a2 = a.lo >> 16;
unsigned int a3 = a.hi & 0xffff;
unsigned int a4 = a.hi >> 16;
unsigned int b1 = b.lo & 0xffff;
unsigned int b2 = b.lo >> 16;
unsigned int b3 = b.hi & 0xffff;
unsigned int b4 = b.hi >> 16;
unsigned int s1 = a1 + b1;
unsigned int s2 = a2 + b2 + (s1 >> 16);
unsigned int s3 = a3 + b3 + (s2 >> 16);
unsigned int s4 = a4 + b4 + (s3 >> 16);
result.lo = (s1 & 0xffff) | ((s2 & 0xffff) << 16);
result.hi = (s3 & 0xffff) | ((s4 & 0xffff) << 16);
result.of = (s4 > 0xffff ? true : false);
return result;
}
bool isTriple(unsigned int a, unsigned int b, unsigned int c) {
u64 aa = square(a);
u64 bb = square(b);
u64 cc = square(c);
u64 aabb = sum(aa, bb);
return aabb.lo == cc.lo && aabb.hi == cc.hi && aabb.of == false;
}
int main() {
cout << isTriple(3,4,5) << endl;
cout << isTriple(2800,9600,10000) << endl;
return 0;
}
Conerting your 32bit integers to 64bit longs or even floating point doubles would edit reduce the chance of overflow and continue being, programmatically, O(1) since all the major architectures (x86, ARM, etc) have int to double conversion op codes at the low level and casting up to a long from int is also an O(1) operation.
bool isTriple(int a, int b, int c) {
long long bigA = a;
long long bigB = b;
long long bigC = c;
return bigA * bigA + bigB * bigB == bigC * bigC;
}
I think little rearrangement would help a lot.
a²+b²=c²
can be written as b²=c²-a²
which is b² = (c-a)(c+a)
and hence we arrive at
b/(c+a) = (c-a)/b
or (c+a)/b = b/(c-a)
Now using the above equation, you do not need to compute squares.
So we must do this
if(((c+a)/(double)b)==((double)(b)/(c-a)))
printf("Yes it is pythagorean triples");
else printf("No it is not");

Byte swapping in bit wise operations

I have this function called byte swap I am supposed to implement. The idea is that the function takes 3 integers (int x, int y, int z) and the function will swap the y and z bytes of the int x. The restrictions are pretty much limited to bit wise operations (no loops, and no if statements or logical operators such as ==).
I don't believe that I presented this problem adequately so Im going to re attempt
I now understand that
byte 1 is referring to bits 0-7
byte 2 is referring to bits 8-15
byte 3 16-23
byte 4 24-31
My function is supposed to take 3 integer inputs, x, y and z. The y byte and z byte on the x then would have to get switched
int byteSwap(int x, int y, int z)
ex of the working function
byteSwap(0x12345678, 1, 3) = 0x56341278
byteSwap(0xDEADBEEF, 0, 2) = 0xDEEFBEAD
My original code had some huge errors in it, namely the fact that I was considering a byte to be 2 bits instead of 8. The main problem that I'm struggling with is that I do not know how to access the bits inside of the given byte. For example, when I'm given byte 4 and 5, how do I access their respected bits? As far as I can tell I can't find a mathematical relationship between the given byte, and its starting bit. I'm assuming I have to shift and then mask, and save those to variables.Though I cannot even get that far.
Extract the ith byte by using ((1ll << ((i + 1) * 8)) - 1) >> (i * 8). Swap using the XOR operator, and put the swapped bytes in their places.
int x, y, z;
y = 1, z = 3;
x = 0x12345678;
int a, b; /* bytes to swap */
a = (x & ((1ll << ((y + 1) * 8)) - 1)) >> (y * 8);
b = (x & ((1ll << ((z + 1) * 8)) - 1)) >> (z * 8);
/* swap */
a = a ^ b;
b = a ^ b;
a = a ^ b;
/* put zeros in bytes to swap */
x = x & (~((0xff << (y * 8))));
x = x & (~((0xff << (z * 8))));
/* put new bytes in place */
x = x | (a << (y * 8));
x = x | (b << (z * 8));
When you say the 'the y and z bytes of x' this implies x is an array of bytes, not an integer. If so:
x[z] ^= x[y];
x[y] ^= x[z];
x[z] ^= x[y];
will do the trick, by swapping x[y] and x[z]
After your edit, it appears you want to swap individual bytes of a 32 bit integer:
On a little-endian machine:
int
swapbytes (int x, int y, int z)
{
char *b = (char *)&x;
b[z] ^= b[y];
b[y] ^= b[z];
b[z] ^= b[y];
return x;
}
On a big-endian machine:
int
swapbytes (int x, int y, int z)
{
char *b = (char *)&x;
b[3-z] ^= b[3-y];
b[3-y] ^= b[3-z];
b[3-z] ^= b[3-y];
return x;
}
With a strict interpretation of the rules, you don't even need the xor trick:
int
swapbytes (int x, int y, int z)
{
char *b = (char *)&x;
char tmp = b[z];
b[z] = b[y];
b[y] = tmp;
return x;
}
On a big-endian machine:
int
swapbytes (int x, int y, int z)
{
char *b = (char *)&x;
char tmp = b[3-z];
b[3-z] = b[3-y];
b[3-y] = tmp;
return x;
}
If you want to do it using bit shifts (note <<3 multiplies by 8):
int
swapbytes (unsigned int x, int y, int z)
{
unsigned int masky = 0xff << (y<<3);
unsigned int maskz = 0xff << (z<<3);
unsigned int origy = (x & masky) >> (y<<3);
unsigned int origz = (x & maskz) >> (z<<3);
return (x & ~masky & ~maskz) | (origz << (y<<3)) | (origy << (z<<3));
}

Swap byte 2 and 4 in a 32 bit integer

I had this interview question -
Swap byte 2 and byte 4 within an integer sequence.
Integer is a 4 byte wide i.e. 32 bits
My approach was to use char *pointer and a temp char to swap the bytes.
For clarity I have broken the steps otherwise an character array can be considered.
unsigned char *b2, *b4, tmpc;
int n = 0xABCD; ///expected output 0xADCB
b2 = &n; b2++;
b4 = &n; b4 +=3;
///swap the values;
tmpc = *b2;
*b2 = *b4;
*b4 = tmpc;
Any other methods?
int someInt = 0x12345678;
int byte2 = someInt & 0x00FF0000;
int byte4 = someInt & 0x000000FF;
int newInt = (someInt & 0xFF00FF00) | (byte2 >> 16) | (byte4 << 16);
To avoid any concerns about sign extension:
int someInt = 0x12345678;
int newInt = (someInt & 0xFF00FF00) | ((someInt >> 16) & 0x000000FF) | ((someInt << 16) & 0x00FF0000);
(Or, to really impress them, you could use the triple XOR technique.)
Just for fun (probably a tupo somewhere):
int newInt = someInt ^ ((someInt >> 16) & 0x000000FF);
newInt = newInt ^ ((newInt << 16) & 0x00FF0000);
newInt = newInt ^ ((newInt >> 16) & 0x000000FF);
(Actually, I just tested it and it works!)
You can mask out the bytes you want and shift them around. Something like this:
unsigned int swap(unsigned int n) {
unsigned int b2 = (0x0000FF00 & n);
unsigned int b4 = (0xFF000000 & n);
n ^= b2 | b4; // Clear the second and fourth bytes
n |= (b2 << 16) | (b4 >> 16); // Swap and write them.
return n;
}
This assumes that the "first" byte is the lowest order byte (even if in memory it may be stored big-endian).
Also it uses unsigned ints everywhere to avoid right shifting introducing extra 1s due to sign extension.
What about unions?
int main(void)
{
char tmp;
union {int n; char ary[4]; } un;
un.n = 0xABCDEF00;
tmp = un.ary[3];
un.ary[3] = un.ary[1];
un.ary[1] = tmp;
printf("0x%.2X\n", un.n);
}
in > 0xABCDEF00
out>0xEFCDAB00
Please don't forget to check endianess. this only work for little endian, but should not be hard to make it portable.

Resources