Unsigned integers overflow does not "wrap around" - c

I read following line from Integer Overflow Wiki:
while unsigned integer overflow causes the number to be reduced modulo
a power of two, meaning that unsigned integers "wrap around" on
overflow.
I have below code where I am trying to create a hash function and got int overflow situation. I tried to mitigate it by using unsigned int but it didn't work and I was able to see negative values.
I know I can handle it other way and it works, as shown in my code comment - Comment 2:. But is it right way and why unsigned int was not wrapping around and overflowing?
int hash(char *word) {
char *temp = word;
unsigned int hash = 0; // Comment 1: I tried to handle int overflow using "unsigned" int.
while (*word != '\0') {
// Comment 2: This works but I do not want to go this way.
//while ((hash * PRIME_MULTIPLIER) < 0) {
// hash = (hash * PRIME_MULTIPLIER) + 2147483647;
//}
hash = hash * PRIME_MULTIPLIER + *word;
word++;
}
printf("Hash for %s is %d\n", temp, hash);
return hash;
}

You're using the wrong format specifier for printf. For an unsigned int, you should be using %u instead of %d.
Also, you should be returning an unsigned int instead of an int.

Related

what is behavior when char is compared with unsigned short in c language?

When I run the following program:
void func(unsigned short maxNum, unsigned short di)
{
if (di == 0) {
return;
}
char i;
for (i = di; i <= maxNum; i += di) {
printf("%u ", i);
}
printf("\n");
}
int main(int argc, char **argv)
{
func(256, 100);
return 0;
}
It is endless loop, but i wonder when char is compared with unsigned short, is char translated to unsigned short? In this situation, char is overflow and larger than maxNum. I really do not know how to explain the results of this program.
Implementation defined behavior, Undefined behavior and CHAR_MAX < 256
Let us sort out:
... unsigned short maxNum
... unsigned short di
char i;
for (i = di; i <= maxNum; i += di) {
printf("%u ", i);
}
char may be a signed char or an unsigned char. Let us assume it is signed.
unsigned short may have the same range as unsigned when both are 16-bit. Yet it is more common to find unsigned short as 16-bit and int, unsigned as 32-bit.
Other possibles exist, yet let us go forward with the above two assumptions.
i = di could be interesting if the value assigned was outside the range of a char, but 100 is always within char range, so i is 100.
Each argument in i <= maxNum goes through usual integer promotions so the signed char i first becomes an int 100 and the 16-bit maxNum becomes an int 256. As 100 < 256 is true, the loop body is entered. Notice i would never expect to have a value as large as 256 since CHAR_MAX is less than 256 - even on following loops - This explains the seen forever loop. But wait there's more
With printf("%u ", i);, printf() expects a matching unsigned argument. But i as a type with less range then int gets promoted to a int with the same value as part of a ... argument. Usually printing mis-matched specifiers and type is undefined behavior with an exception: when the value is representable as both a signed and unsigned type. As 100 is the first time, all is OK.
At the loop end, i += di is like i = i + di;. The addition arguments go through usual integer promotions and become int 100 added to int 100. That sum is 200. So far nothing strange. Yet assigning a 200 to a signed char coverts the 200 as it is out of range. This is implementation defined behavior. The assigned value could have been 0 or 1 or 2.... Typically, the value is wrapped around ("modded") by adding/subtracting 256 until in range. 100 + 100 -256 --> -56.
But the 2nd printf("%u ", i); attempts printing -56 and that is undefined behavior.
Tip: enable all warnings, Good compilers will point out many of these problems and save you time.
I got the answer from http://www.idryman.org/blog/2012/11/21/integer-promotion/ , both char and unsigned short are translated to int which can explain the process and result of this programs.

Why does this djb2 implementations loop terminate?

A string is terminated by a s single null byte. Since a int is bigger then an char how can the int become 0 and terminate the loop consistenly?
source : http://www.cse.yorku.ca/~oz/hash.html
unsigned long
hash(unsigned char *str)
{
unsigned long hash = 5381;
int c;
while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
Loading an integer from a smaller type does not preserve the bits that the smaller type didn't have; they are cleared (or set by sign-extension, for signed types).
So:
int x = 0xfeefd00d;
x = (char) 1;
leaves the value 1, as an integer, in x, not 0xfeedf001.
When a variable is used in an expression together with variables of different types (like in the assignment in the loop condition), there's an implicit conversion being made. Conversions only convert between types, but if possible keeps the value.
So when you reach the null-terminator in str, it's converted (promoted actually) to an int, keeping the value 0. And 0 is always "false", which ends the loop.

unsigned char to unsigned char array of 8 original bits

I am trying to take a given unsigned char and store the 8 bit value in an unsigned char array of size 8 (1 bit per array index).
So given the unsigned char A
Id like to create an unsigned char array containing 0 1 0 0 0 0 0 1 (one number per index)
What would be the most effective way to achieve this? Happy Thanksgiving btw!!
The fastest (not sure if that's what you menat by "effective") way of doing this is probably something like
void char2bits1(unsigned char c, unsigned char * bits) {
int i;
for(i=sizeof(unsigned char)*8; i; c>>=1) bits[--i] = c&1;
}
The function takes the char to convert as the first argument and fills the array bits with the corresponding bit pattern. It runs in 2.6 ns on my laptop. It assumes 8-bit bytes, but not how many bytes long a char is, and does not require the input array to be zero-initialized beforehand.
I didn't expect this to be the fastest approach. My first attempt looked like this:
void char2bits2(unsigned char c, unsigned char * bits) {
for(;c;++bits,c>>=1) *bits = c&1;
}
I thought this would be faster by avoiding array lookups, by looping in the natural order (at the cost of producing the bits in the opposite order of what was requested), and by stopping as soon as c is zero (so the bits array would need to be zero-initialized before calling the function). But to my surprise, this version had a running time of 5.2 ns, double that of the version above.
Investigating the corresponding assembly revealed that the difference was loop unrolling, which was being performed in the former case but not the latter. So this is an illustration of how modern compilers and modern CPUs often have surprising performance characteristics.
Edit: If you actually want the unsigned chars in the result to be the chars '0' and '1', use this modified version:
void char2bits3(unsigned char c, unsigned char * bits) {
int i;
for(i=sizeof(unsigned char)*8; i; c>>=1) bits[--i] = '0'+(c&1);
}
You could use bit operators as recommended.
#include <stdio.h>
main() {
unsigned char input_data = 8;
unsigned char array[8] = {0};
int idx = sizeof(array) - 1;
while (input_data > 0) {
array[idx--] = input_data & 1;
input_data /= 2; // or input_data >>= 1;
}
for (unsigned long i = 0; i < sizeof(array); i++) {
printf("%d, ", array[i]);
}
}
Take the value, right shift it and mask it to keep only the lower bit. Add the value of the lower bit to the character '0' so that you get either '0' or '1' and write it into the array:
unsigned char val = 65;
unsigned char valArr[8+1] = {};
for (int loop=0; loop<8; loop++)
valArr[7-loop] = '0' + ((val>>loop)&1);
printf ("val = %s", valArr);

turning hash key (unsigned long) to int safely

I have a hash function:
unsigned long strhash(char *string)
{
unsigned long hash = 5381;
int c;
while (c = *string++)
{
hash = ((hash << 5) + hash) + c;
}
return hash;
}
and my program calls it like so
char testString[] = "Hello World";
unsigned long hashcode = 0;
hashcode = strhash(testString);
int slot = 0;
slot = hashcode%30;
printf("%d\n", slot);
the module is to module what will be the size of my array
is this safely converting from unsigned long to int?
because it prints out 17 making me feel like it is working but I am unsure
Note: I assume you meant printf("%d\n", slot); in the last statement.
By doing a modulo with 30, you are effectively restricting the hash value to only 30 unique values (0 to 29). This drastically reduces the effectiveness of the hash, as many different strings will be mapped to the same hash value (one out of those 30).
This link gives you a number of alternative hash functions, including an unsigned int version of the djb2 algorithm in your question. Use one of them instead.

Function Returning Value of Type I don't Want it to Return

For a problem at school, I need to convert a ASCII string of character digits to a decimal value. I wrote a function to do this and specified the return type to be an unsigned short as you can see in the code below.
#include <stdio.h>
unsigned short str_2_dec(char* input_string, int numel);
int main()
{
short input;
char input_string[6]= "65535";
input = str_2_dec(input_string, 5);
printf("Final Value: %d", input);
return 0;
}
unsigned short str_2_dec(char* input_string, int numel)
{
int factor = 1;
unsigned short value = 0;
int index;
for(index=0; index <(numel-1); index++)
{
factor *= 10;
}
for(index = numel; index > 0; index--)
{
printf("Digit: %d; Factor: %d; ", *(input_string+(numel-index))-48, factor);
value += factor * ((*(input_string+(numel - index))-48));
printf("value: %d\n\n", value);
factor /= 10;
}
return value;
}
When running this code, the program prints -1 as the final value instead of 65535. It seems it's displaying the corresponding signed value anyway. Seems like something very simple, but I can't find an answer. A response would be greatly appreciated.
The return type for str_2_dec() is unsigned short but you are storing the value in a (signed) short variable. You should declare your variables the appropriate type otherwise you will have problems as you have observed.
In this case, you converted "65535" to an unsigned short which has the bit pattern FFFFHex. That bit pattern was reinterpreted as a (signed) short which is the decimal value -1.
You should change your main() to something like this:
int main()
{
unsigned short input; /* to match the type the function is returning */
char input_string[6]= "65535";
input = str_2_dec(input_string, 5);
printf("Final Value: %hf", input);
return 0;
}
The problem is that you are taking the unsigned short return value of the function and storing it in a (signed) short variable, input. Since the value is outside the range representable in short, and since short is signed, this results in either an implementation-defined result or an implementation-defined signal being raised.
Change the type of input to unsigned short and everything will be fine.
You mean that is printing index as it was a (signed) short here?
short input;
...
printf("Final Value: %d", input);
Update: Since the hint doesn't seem to be catching, I will be more direct: Your declaration of input should be unsigned short input;.
You are using the wrong format specifier in printf. try using %u instead of %d
The problem isn't with the function but with how you are printing the return value.
printf("Final Value: %d", input);
The %d is place-holder for int type, not short.
Use %hu instead.
You didn't use the correct format specifier for the
short input;
printf("final value=%d\n",input);
This makes the difference of your out put.

Resources