Rotating Hash for 16 bit

Rotating Hash for 16 bit - c

This site gives description of rotating hash as follows.
unsigned rot_hash ( void *key, int len )
{
unsigned char *p = key;
unsigned h = 0;
int i;
for ( i = 0; i < len; i++ )
h = ( h << 4 ) ^ ( h >> 28 ) ^ p[i];
return h;
}
The returned value is 32 bit here. However, I want to return a 16 bit hash value. For that purpose, is it correct to assign h as follows in the loop? Consider h to be declared as a 16 bit integer here.
for ( i = 0; i < len; i++ )
h = ( h << 4 ) ^ ( h >> 12 ) ^ p[i];

It is probably best to keep the big hash, and only truncate on return, like:
for ( i = 0; i < len; i++ )
h = ( h << 4 ) ^ ( h >> 28 ) ^ p[i];
return h & 0xffff;
The shift constants 4 and 28 are probably not the best (in short: because they have a common divisor)
After some experimentation, I came to the following hashfunction, which is aimed at having maximal entropy in the lower bits (such that a power-of-two table size can be used) (this is the one used in Wakkerbot):
unsigned hash_mem(void *dat, size_t len)
{
unsigned char *str = (unsigned char*) dat;
unsigned val=0;
size_t idx;
for(idx=0; idx < len; idx++ ) {
val ^= (val >> 2) ^ (val << 5) ^ (val << 13) ^ str[idx] ^ 0x80001801;
}
return val;
}
The extra perturbance with 0x80001801 is not strictly needed, but helps if the hashed items have long common prefixes. It also helps if these prefixes consist of 0x0 values.

It's hard to talk about "correct" with hashes, because any deterministic result can be considered correct. Perhaps the hash distribution won't be so good, but this hash doesn't seem like the strongest anyway.
With the change you suggest, the number you'll get will still be a 32 bit number, and the high 16 bits won't be zeros.
The easiest thing to do is change nothing, and cast the result to unsigned short.

Related

Converting 8 byte char array into long

How do we convert 8 byte char array into long since << does not work for type long?
#define word_size 8
long num = 0;
char a[word_size] = "\x88\x99\xaa\x0bb\xcc\xdd\xee\xff";
for (i=0; i < word_size;i++) {
a[(word_size-1) - i] |= (num << (8*(word_size - i - 1))) & 0xFF;
}
printf("%lx\n", num);

The following code is more efficient:
unsigned char[word_size] = ...;
int64_t num = 0;
for ( int i = 0 ; i < sizeof(a) ; i++ )
num = (num << 8) | a[i];
This assumes big endian (highest order byte first) ordering of the bytes in the array. For little endian (as you appear to use) just process it top-down:
for ( int i = sizeof(a) ; --i >= 0 ; )
Note: whether char is signed or unsigned is implementation-dependent, so nail it down to be unsigned, otherwise the logical-or will not work. Better use uint8_t; that is defined to be 8 bits, while char is not.
Note: You should use all-uppercase for constants: WORD_SIZE instead of word_size. That is a commonly accepted standard (quite the only about case for identifiers in C).

Calculate parity bit from a string in C

I am trying to calculate the parity bit in a string using the following code. I first calculate a parityByte for the string and then calculate
a parityBit for that byte.
From what I have gathered, these functions should do the trick, but right now I'm not so sure. The program in which I use them fails, and I would like to know if it's because of these or if I should look some other place.
char calculateParity(char *payload, int size){
char r = 0;
int i;
for(i = 0; i < size; i++){
r ^= payload[i];
}
return calcParityBit(r);
}
char calcParityBit(char x){
x ^= x >> 8;
x ^= x >> 4;
x ^= x >> 2;
x ^= x >> 1;
return x & 1;
}

With help from Bit Twiddling Hacks
char calcParityBit (unsigned char v)
{
return (0x6996u >> ((v ^ (v >> 4)) & 0xf)) & 1;
}
This is 5 operations versus 7 (after taking #squeamish ossifrage's good advice).

You must remember:
1) 'x >> a' the same thing for(int i = 0; i < a; i++) x/=2;
because, if you use operator '>>' for SIGNED type, you duplicate first bit, whitch == 1 in signed types;
2) operators '>>' and '<<' returns unsigned int value;
(Error example: unsigned char y = (x << 2) >> 2; for reset (in 0) two first bits)

As r3mainer comments: use unsigned char for the calculation. As char may be signed, the right shifting may replicate the sign bit.
Further, code typically runs best with a return value of int versus char. I recommend using a return value of int or even simply bool.
// Find parity (of any width up to the width of an unsigned)
int calcEvenParityBit(unsigned par, unsigned width) {
while (width > 1) {
par ^= par >> (width/2);
width -= width/2;
}
// Only return Least Significant Bit
return par % 2;
}
int calculateEvenParity(char *payload, int size) {
unsigned char r = 0;
int i;
for(i = 0; i < size; i++) {
r ^= payload[i];
}
return calcEvenParityBit(r, CHAR_BIT);
}
Invert the result for odd parity.

Your function:
char calcParityBit(char x){
x ^= x >> 8;
x ^= x >> 4;
x ^= x >> 2;
x ^= x >> 1;
return x & 1;
}
calculates parity for only three bits of your byte. To calculate parity of the entire 8 bits number, you can do something like this:
char calcParityBit(char x){
return ( (x>>7) ^
(x>>6) ^
(x>>5) ^
(x>>4) ^
(x>>3) ^
(x>>2) ^
(x>>1) ^
(x) ) & 1;
}
As you stick with the least significant bit, the fact that your argument is signed and the shift right operation may fill the shifted bits with '1' if the most significat bit was '1', is irrelevant for this solution (which is derived from yours)
Although it's good practice not to use number with sign if the sign is not of any actual use, and you treat the number as an unsigned one.

Bitwise comparison

I have an array that represents an 8x8 "bit" block
unsigned char plane[8]
What I want to do is loop through this "bit" block horizontally
and count up the number of times a change occurs between
a 1 and 0.
When I extract a bit, it is getting stored in an
unsigned char, so basically I want to increase a count
when one char is nonzero and the other is zero.
What I have is the following:
int getComplexity(unsigned char *plane) {
int x, y, count = 0;
unsigned char bit;
for(x = 0; x < 8; x++) {
for(y = 0; y < 8; y++) {
if(y == 0) {
bit = plane[x] & 1 << y;
continue;
}
/*
MISSING CODE
*/
}
}
}
For the missing code, I could do:
if( (bit && !(plane[x] & 1 << y)) || (!bit && (plane[x] & 1 << y)) ) {
bit = plane[x] & 1 << y;
count++;
}
However, what I really want see is if there is some
clever bitwise operation to do this step instead
of having two separate tests.

This is really just a gcc solution because the popcnt intrinsic wont work on every other compiler.
unsigned char plane[8];
static const uint64_t tmask = 0x8080808080808080UL;
uint64_t uplane = *(uint64_t *)plane; // pull the whole array into a single uint
return __builtin_popcnt( ~tmask & (uplane ^ (uplane >> 1) ) );
For x86 the popcnt instruction wasnt actually implemented until sse4.2 was (so rather recently).
Also, although this looks like it relies on endianness, it doesn't because none of the individual bytes are allowed to interact thanks to the mask.
It is making some assumptions about the way memory works :\
As a side note doing this same thing in the "horizontal" direction is just as easy:
return __builtin_popcnt(0xFFFFFFFFFFFFFFUL & ( uplane ^ (uplane >> 8) ) );

Is there a more optimal way to approach some of these functions?

I completed some bit manipulation exercises out of a textbook recently and have grasped onto some of the core ideas behind manipulating bits firmly. My main concern with making this post is for optimizations to my current code. I get the hunch that there are some functions that I could approach better. Do you have any recommendations for the following code?
#include <stdio.h>
#include "funcs.h"
// basically sizeof(int) using bit manipulation
unsigned int int_size(){
int size = 0;
for(unsigned int i = ~00u; i > 0; i >>= 1, size++);
return size;
}
// get a bit at a specific nth index
// index starts with 0 on the most significant bit
unsigned int bit_get(unsigned int data, unsigned int n){
return (data >> (int_size() - n - 1)) & 1;
}
// set a bit at a specific nth index
// index starts with 0 on the most significant bit
unsigned int bit_set(unsigned int data, unsigned int n){
return data | (1 << (int_size() - n - 1));
}
// gets the bit width of the data (<32)
unsigned int bit_width(unsigned int data){
int width = int_size();
for(; width > 0; width--)
if((data & (1 << width)) != 0)
break;
return width + 1;
}
// print the data contained in an unsigned int
void print_data(unsigned int data){
printf("%016X = ",data);
for(int i = 0; i < int_size(); i++)
printf("%X",bit_get(data,i));
putchar('\n');
}
// search for pattern in source (where pattern is n wide)
unsigned int bitpat_search(unsigned int source, unsigned int pattern,
unsigned int n){
int right = int_size() - n;
unsigned int mask = 0;
for(int i = 0; i < n; i++)
mask |= 1 << i;
for(int i = 0; i < right; i++)
if(((source & (mask << (right - i))) >> (right - i) ^ pattern) == 0)
return i - bit_width(source);
return -1;
}
// extract {count} bits from data starting at {start}
unsigned int bitpat_get(unsigned int data, int start, int count){
if(start < 0 || count < 0 || int_size() <= start || int_size() <= count || bit_width(data) != count)
return -1;
unsigned int mask = 1;
for(int i = 0; i < count; i++)
mask |= 1 << i;
mask <<= int_size() - start - count;
return (data & mask) >> (int_size() - start - count);
}
// set {count} bits (basically width of {replace}) in {*data} starting at {start}
void bitpat_set(unsigned int *data, unsigned int replace, int start, int count){
if(start < 0 || count < 0 || int_size() <= start || int_size() <= count || bit_width(replace) != count)
return;
unsigned int mask = 1;
for(int i = 0; i < count; i++)
mask |= 1 << i;
*data = ((*data | (mask << (int_size() - start - count))) & ~(mask << (int_size() - start - count))) | (replace << (int_size() - start - count));
}

because your int_size() function returns the same value each time you could save some time there:
unsigned int int_size(){
static unsigned int size = 0;
if (size == 0)
for(unsigned int i = ~00u; i > 0; i >>= 1, size++);
return size;
}
so it will calculate the value only once.
But replacing all calls of this function by sizeof(int)*8 would be much better.

I looked through your code and there's nothing that jumps out at me.
Overall, don't sweat the small stuff. If the code runs and works fine, no worries. If you are really concerned about performance, go ahead and run your code through a profiler.
Overall, I will say that the one thing you might be dealing with is the "paranoia" I see in your code regarding the width of an int. I generally use the fixed-length types in stdint.h and give the caller some options regarding what length of ints (i.e. uint8_t, uint16_t, uint32_t, etc.) they want to deal with.
Also, in C99, there are bitfields, which allow for each bit to be addressed into.

unsigned int int_size(){
return __builtin_popcount((unsigned int) -1) / __builtin_popcount((unsigned char) -1);
}
This should be faster than looping.

Including int_size() in all the others seems like its going to kill performance unless the compiler is really good at optimizing that loop out.
You could use a uint32_t instead of an int and then you would know up front the size.
You could also use sizeof(int) to get the size in bytes of an int and multiply by 8. I haven't seen an environment that defined a byte to be other than 8 bits, but the standard does seem to allow for it in saying it is implementation defined.

how to make a bit-set/byte-array conversion in c

Given an array,
unsigned char q[32]="1100111...",
how can I generate a 4-bytes bit-set, unsigned char p[4], such that, the bit of this bit-set, equals to value inside the array, e.g., the first byte p[0]= "q[0] ... q[7]"; 2nd byte p[1]="q[8] ... q[15]", etc.
and also how to do it in opposite, i.e., given bit-set, generate the array?
my own trial out for the first part.
unsigned char p[4]={0};
for (int j=0; j<N; j++)
{
if (q[j] == '1')
{
p [j / 8] |= 1 << (7-(j % 8));
}
}
Is the above right? any conditions to check? Is there any better way?
EDIT - 1
I wonder if above is efficient way? As the array size could be upto 4096 or even more.

First, Use strtoul to get a 32-bit value. Then convert the byte order to big-endian with htonl. Finally, store the result in your array:
#include <arpa/inet.h>
#include <stdlib.h>
/* ... */
unsigned char q[32] = "1100111...";
unsigned char result[4] = {0};
*(unsigned long*)result = htonl(strtoul(q, NULL, 2));
There are other ways as well.
But I lack <arpa/inet.h>!
Then you need to know what byte order your platform is. If it's big endian, then htonl does nothing and can be omitted. If it's little-endian, then htonl is just:
unsigned long htonl(unsigned long x)
{
x = (x & 0xFF00FF00) >> 8) | (x & 0x00FF00FF) << 8);
x = (x & 0xFFFF0000) >> 16) | (x & 0x0000FFFF) << 16);
return x;
}
If you're lucky, your optimizer might see what you're doing and make it into efficient code. If not, well, at least it's all implementable in registers and O(log N).
If you don't know what byte order your platform is, then you need to detect it:
typedef union {
char c[sizeof(int) / sizeof(char)];
int i;
} OrderTest;
unsigned long htonl(unsigned long x)
{
OrderTest test;
test.i = 1;
if(!test.c[0])
return x;
x = (x & 0xFF00FF00) >> 8) | (x & 0x00FF00FF) << 8);
x = (x & 0xFFFF0000) >> 16) | (x & 0x0000FFFF) << 16);
return x;
}
Maybe long is 8 bytes!
Well, the OP implied 4-byte inputs with their array size, but 8-byte long is doable:
#define kCharsPerLong (sizeof(long) / sizeof(char))
unsigned char q[8 * kCharsPerLong] = "1100111...";
unsigned char result[kCharsPerLong] = {0};
*(unsigned long*)result = htonl(strtoul(q, NULL, 2));
unsigned long htonl(unsigned long x)
{
#if kCharsPerLong == 4
x = (x & 0xFF00FF00UL) >> 8) | (x & 0x00FF00FFUL) << 8);
x = (x & 0xFFFF0000UL) >> 16) | (x & 0x0000FFFFUL) << 16);
#elif kCharsPerLong == 8
x = (x & 0xFF00FF00FF00FF00UL) >> 8) | (x & 0x00FF00FF00FF00FFUL) << 8);
x = (x & 0xFFFF0000FFFF0000UL) >> 16) | (x & 0x0000FFFF0000FFFFUL) << 16);
x = (x & 0xFFFFFFFF00000000UL) >> 32) | (x & 0x00000000FFFFFFFFUL) << 32);
#else
#error Unsupported word size.
#endif
return x;
}
For char that isn't 8 bits (DSPs like to do this), you're on your own. (This is why it was a Big Deal when the SHARC series of DSPs had 8-bit bytes; it made it a LOT easier to port existing code because, face it, C does a horrible job of portability support.)
What about arbitrary length buffers? No funny pointer typecasts, please.
The main thing that can be improved with the OP's version is to rethink the loop's internals. Instead of thinking of the output bytes as a fixed data register, think of it as a shift register, where each successive bit is shifted into the right (LSB) end. This will save you from all those divisions and mods (which, hopefully, are optimized away to bit shifts).
For sanity, I'm ditching unsigned char for uint8_t.
#include <stdint.h>
unsigned StringToBits(const char* inChars, uint8_t* outBytes, size_t numBytes,
size_t* bytesRead)
/* Converts the string of '1' and '0' characters in `inChars` to a buffer of
* bytes in `outBytes`. `numBytes` is the number of available bytes in the
* `outBytes` buffer. On exit, if `bytesRead` is not NULL, the value it points
* to is set to the number of bytes read (rounding up to the nearest full
* byte). If a multiple of 8 bits is not read, the last byte written will be
* padded with 0 bits to reach a multiple of 8 bits. This function returns the
* number of padding bits that were added. For example, an input of 11 bits
* will result `bytesRead` being set to 2 and the function will return 5. This
* means that if a nonzero value is returned, then a partial byte was read,
* which may be an error.
*/
{ size_t bytes = 0;
unsigned bits = 0;
uint8_t x = 0;
while(bytes < numBytes)
{ /* Parse a character. */
switch(*inChars++)
{ '0': x <<= 1; ++bits; break;
'1': x = (x << 1) | 1; ++bits; break;
default: numBytes = 0;
}
/* See if we filled a byte. */
if(bits == 8)
{ outBytes[bytes++] = x;
x = 0;
bits = 0;
}
}
/* Padding, if needed. */
if(bits)
{ bits = 8 - bits;
outBytes[bytes++] = x << bits;
}
/* Finish up. */
if(bytesRead)
*bytesRead = bytes;
return bits;
}
It's your responsibility to make sure inChars is null-terminated. The function will return on the first non-'0' or '1' character it sees or if it runs out of output buffer. Some example usage:
unsigned char q[32] = "1100111...";
uint8_t buf[4];
size_t bytesRead = 5;
if(StringToBits(q, buf, 4, &bytesRead) || bytesRead != 4)
{
/* Partial read; handle error here. */
}
This just reads 4 bytes, and traps the error if it can't.
unsigned char q[4096] = "1100111...";
uint8_t buf[512];
StringToBits(q, buf, 512, NULL);
This just converts what it can and sets the rest to 0 bits.
This function could be done better if C had the ability to break out of more than one level of loop or switch; as it stands, I'd have to add a flag value to get the same effect, which is clutter, or I'd have to add a goto, which I simply refuse.

I don't think that will quite work. You are comparing each "bit" to 1 when it should really be '1'. You can also make it a bit more efficient by getting rid of the if:
unsigned char p[4]={0};
for (int j=0; j<32; j++)
{
p [j / 8] |= (q[j] == `1`) << (7-(j % 8));
}
Going in reverse is pretty simple too. Just mask for each "bit" that you set earlier.
unsigned char q[32]={0};
for (int j=0; j<32; j++) {
q[j] = p[j / 8] & ( 1 << (7-(j % 8)) ) + '0';
}
You'll notice the creative use of (boolean) + '0' to convert between 1/0 and '1'/'0'.

According to your example it does not look like you are going for readability, and after a (late) refresh my solution looks very similar to Chriszuma except for the lack of parenthesis due to order of operations and the addition of the !! to enforce a 0 or 1.
const size_t N = 32; //N must be a multiple of 8
unsigned char q[N+1] = "11011101001001101001111110000111";
unsigned char p[N/8] = {0};
unsigned char r[N+1] = {0}; //reversed
for(size_t i = 0; i < N; ++i)
p[i / 8] |= (q[i] == '1') << 7 - i % 8;
for(size_t i = 0; i < N; ++i)
r[i] = '0' + !!(p[i / 8] & 1 << 7 - i % 8);
printf("%x %x %x %x\n", p[0], p[1], p[2], p[3]);
printf("%s\n%s\n", q,r);

If you are looking for extreme efficiency, try to use the following techniques:
Replace if by subtraction of '0' (seems like you can assume your input symbols can be only 0 or 1).
Also process the input from lower indices to higher ones.
for (int c = 0; c < N; c += 8)
{
int y = 0;
for (int b = 0; b < 8; ++b)
y = y * 2 + q[c + b] - '0';
p[c / 8] = y;
}
Replace array indices by auto-incrementing pointers:
const char* qptr = q;
unsigned char* pptr = p;
for (int c = 0; c < N; c += 8)
{
int y = 0;
for (int b = 0; b < 8; ++b)
y = y * 2 + *qptr++ - '0';
*pptr++ = y;
}
Unroll the inner loop:
const char* qptr = q;
unsigned char* pptr = p;
for (int c = 0; c < N; c += 8)
{
*pptr++ =
qptr[0] - '0' << 7 |
qptr[1] - '0' << 6 |
qptr[2] - '0' << 5 |
qptr[3] - '0' << 4 |
qptr[4] - '0' << 3 |
qptr[5] - '0' << 2 |
qptr[6] - '0' << 1 |
qptr[7] - '0' << 0;
qptr += 8;
}
Process several input characters simultaneously (using bit twiddling hacks or MMX instructions) - this has great speedup potential!