Why does this djb2 implementations loop terminate? - c

A string is terminated by a s single null byte. Since a int is bigger then an char how can the int become 0 and terminate the loop consistenly?
source : http://www.cse.yorku.ca/~oz/hash.html
unsigned long
hash(unsigned char *str)
{
unsigned long hash = 5381;
int c;
while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}

Loading an integer from a smaller type does not preserve the bits that the smaller type didn't have; they are cleared (or set by sign-extension, for signed types).
So:
int x = 0xfeefd00d;
x = (char) 1;
leaves the value 1, as an integer, in x, not 0xfeedf001.

When a variable is used in an expression together with variables of different types (like in the assignment in the loop condition), there's an implicit conversion being made. Conversions only convert between types, but if possible keeps the value.
So when you reach the null-terminator in str, it's converted (promoted actually) to an int, keeping the value 0. And 0 is always "false", which ends the loop.

Related

Bit Operation in C

I am new in C and this is for a school project. I am implementing the Skinny Block Cipher in C.
My code:
unsigned char *bits[8]; // this array holds 1 byte of data.
... call in another func to convert hex to bit.
unsigned int four = bits[4] - '0'; // value 0
unsigned int seven = bits[7] - '0'; // value 1
unsigned int six = bits[6] - '0'; // value 1
four = four ^ ~(seven | six); // eq 1;
Now, my question
Do I have to convert the char to int every time to run the bit operation? What will happen if I do it using unsigned char?
If I store the value for eq - 1 on an unsigned int, the value is fe which is wrong (according to an online bit calculator), on the other hand, if I store the result in an unsigned char, the value is -2 which is correct. What's the difference? I am kind of lost here.
bits[8] is a pointer and I tried to do the eq 1 using indexes from bits pointer, like bits[4], etc but my VSCode throws an error and I don't understand why. Obviously, I have some gaps in my knowledge. I am using my Python knowledge to go through this.
I don't know if I am giving all the information that's needed. Hit me up for extras!
TIA.
I updated the code
unsigned char bits[9];
It converts a3 into 010100011.
unsigned char *bits[8]; // this array holds 1 byte of data.
No, it is an array of 8 pointers to char.
unsigned int four = bits[4] - '0'; // value 0
This will not work as you subtract the integer '0' from the pointer.
If you want to keep the string representation of the number in the binary form you need to define an array of 9 chars
char bits[9] = "10010110";
Then you can do the operations as in your code.
Do I have to convert the char to int every time to run the bit
operation? What will happen if I do it using unsigned char?
If you want to keep it as a string then - yes.
unsigned char x = 0x96;
unsigned int four = !!(x & (1 << 4));
unsigned int seven = !!(x & (1 << 7));
unsigned int six = !!(x & (1 << 6));

Converting an integer to binary strings in c

This is what I've tried, my logic is that my tmp shifting left 31 times will get compared to the user input integer I and a value of 1 or 0 will be inserted to index str[0] -> str[31] and the I null terminate str[32] with the \0.
However, I'm getting a segmentation fault.
P.S. I'm not allowed to change the parameters of this function and my professor set the size of str to be 33 in the main, which, I'm not allow to change either.
void int2bitstr(int I, char *str) {
int tmp = 1 << 31;
do{
*str++ = !!(tmp & I) + '0';
} while(tmp >>= 1);
*str = '\0';
}
Try making tmp an unsigned int. The behaviour of right-shifting a negative (signed) integer is implementation-defined, and in your case is likely shifting in 1s (the original MSB) thus causing the loop to exceed the length of str.

summing unsigned and signed ints, same or different answer?

If I have the following code in C
int main()
{
int x = <a number>
int y = <a number>
unsigned int v = x;
unsigned int w = y;
int ssum = x * y;
unsigned int usum = v * w;
printf("%d\n", ssum);
printf("%d\n", usum);
if(ssum == usum){
printf("Same\n");
} else {
printf("Different\n");
}
return 0;
}
Which would print the most? Would it be equal since signed and unsigned would produce the same result, then if you have a negative like -1, when it gets assigned to int x it becomes 0xFF, and if you want to do -1 + (-1), if you do it the signed way to get -2 = 0xFE, and since the unsigned variables would be set to 0xFF, if you add them you would still get 0xFE. And the same holds true for 2 + (-3) or -2 + 3, in the end the hexadecimal values are identical. So in C is that what's looked at when it sees signedSum == unsignedSum? It doesnt care that one is actually a large number and the other is -2, as long at the 1's and 0's are the same?
Are there any values that would make this not true?
The examples you have given are incorrect in C. Also, converting between signed and unsigned types is not required to preserve bit patterns (the conversion is by value), although with some representations bit patterns are preserved.
There are circumstances where the result of operations will be the same, and circumstances where the result will differ.
If the (actual) sum of adding two ints would overflow an int
(i.e. value outside range that an int can represent) the result is
undefined behaviour. Anything can happen at that point (including
the program terminating abnormally) - subsequently converting to an unsigned doesn't change anything.
Converting an int with negative value to unsigned int uses modulo
arithmetic (modulo the maximum value that an unsigned can
represent, plus one). That is well defined by the standard, but
means -1 (type int) will convert to the maximum value that an
unsigned can represent (i.e. UINT_MAX, an implementation defined
value specified in <limits.h>).
Similarly, adding two variables of type unsigned int always uses
modulo arithmetic.
Because of things like this, your question "which would produce the most?" is meaningless.

turning hash key (unsigned long) to int safely

I have a hash function:
unsigned long strhash(char *string)
{
unsigned long hash = 5381;
int c;
while (c = *string++)
{
hash = ((hash << 5) + hash) + c;
}
return hash;
}
and my program calls it like so
char testString[] = "Hello World";
unsigned long hashcode = 0;
hashcode = strhash(testString);
int slot = 0;
slot = hashcode%30;
printf("%d\n", slot);
the module is to module what will be the size of my array
is this safely converting from unsigned long to int?
because it prints out 17 making me feel like it is working but I am unsure
Note: I assume you meant printf("%d\n", slot); in the last statement.
By doing a modulo with 30, you are effectively restricting the hash value to only 30 unique values (0 to 29). This drastically reduces the effectiveness of the hash, as many different strings will be mapped to the same hash value (one out of those 30).
This link gives you a number of alternative hash functions, including an unsigned int version of the djb2 algorithm in your question. Use one of them instead.

How to convert from integer to unsigned char in C, given integers larger than 256?

As part of my CS course I've been given some functions to use. One of these functions takes a pointer to unsigned chars to write some data to a file (I have to use this function, so I can't just make my own purpose built function that works differently BTW). I need to write an array of integers whose values can be up to 4095 using this function (that only takes unsigned chars).
However am I right in thinking that an unsigned char can only have a max value of 256 because it is 1 byte long? I therefore need to use 4 unsigned chars for every integer? But casting doesn't seem to work with larger values for the integer. Does anyone have any idea how best to convert an array of integers to unsigned chars?
Usually an unsigned char holds 8 bits, with a max value of 255. If you want to know this for your particular compiler, print out CHAR_BIT and UCHAR_MAX from <limits.h> You could extract the individual bytes of a 32 bit int,
#include <stdint.h>
void
pack32(uint32_t val,uint8_t *dest)
{
dest[0] = (val & 0xff000000) >> 24;
dest[1] = (val & 0x00ff0000) >> 16;
dest[2] = (val & 0x0000ff00) >> 8;
dest[3] = (val & 0x000000ff) ;
}
uint32_t
unpack32(uint8_t *src)
{
uint32_t val;
val = src[0] << 24;
val |= src[1] << 16;
val |= src[2] << 8;
val |= src[3] ;
return val;
}
Unsigned char generally has a value of 1 byte, therefore you can decompose any other type to an array of unsigned chars (eg. for a 4 byte int you can use an array of 4 unsigned chars). Your exercise is probably about generics. You should write the file as a binary file using the fwrite() function, and just write byte after byte in the file.
The following example should write a number (of any data type) to the file. I am not sure if it works since you are forcing the cast to unsigned char * instead of void *.
int homework(unsigned char *foo, size_t size)
{
int i;
// open file for binary writing
FILE *f = fopen("work.txt", "wb");
if(f == NULL)
return 1;
// should write byte by byte the data to the file
fwrite(foo+i, sizeof(char), size, f);
fclose(f);
return 0;
}
I hope the given example at least gives you a starting point.
Yes, you're right; a char/byte only allows up to 8 distinct bits, so that is 2^8 distinct numbers, which is zero to 2^8 - 1, or zero to 255. Do something like this to get the bytes:
int x = 0;
char* p = (char*)&x;
for (int i = 0; i < sizeof(x); i++)
{
//Do something with p[i]
}
(This isn't officially C because of the order of declaration but whatever... it's more readable. :) )
Do note that this code may not be portable, since it depends on the processor's internal storage of an int.
If you have to write an array of integers then just convert the array into a pointer to char then run through the array.
int main()
{
int data[] = { 1, 2, 3, 4 ,5 };
size_t size = sizeof(data)/sizeof(data[0]); // Number of integers.
unsigned char* out = (unsigned char*)data;
for(size_t loop =0; loop < (size * sizeof(int)); ++loop)
{
MyProfSuperWrite(out + loop); // Write 1 unsigned char
}
}
Now people have mentioned that 4096 will fit in less bits than a normal integer. Probably true. Thus you can save space and not write out the top bits of each integer. Personally I think this is not worth the effort. The extra code to write the value and processes the incoming data is not worth the savings you would get (Maybe if the data was the size of the library of congress). Rule one do as little work as possible (its easier to maintain). Rule two optimize if asked (but ask why first). You may save space but it will cost in processing time and maintenance costs.
The part of the assignment of: integers whose values can be up to 4095 using this function (that only takes unsigned chars should be giving you a huge hint. 4095 unsigned is 12 bits.
You can store the 12 bits in a 16 bit short, but that is somewhat wasteful of space -- you are only using 12 of 16 bits of the short. Since you are dealing with more than 1 byte in the conversion of characters, you may need to deal with endianess of the result. Easiest.
You could also do a bit field or some packed binary structure if you are concerned about space. More work.
It sounds like what you really want to do is call sprintf to get a string representation of your integers. This is a standard way to convert from a numeric type to its string representation. Something like the following might get you started:
char num[5]; // Room for 4095
// Array is the array of integers, and arrayLen is its length
for (i = 0; i < arrayLen; i++)
{
sprintf (num, "%d", array[i]);
// Call your function that expects a pointer to chars
printfunc (num);
}
Without information on the function you are directed to use regarding its arguments, return value and semantics (i.e. the definition of its behaviour) it is hard to answer. One possibility is:
Given:
void theFunction(unsigned char* data, int size);
then
int array[SIZE_OF_ARRAY];
theFunction((insigned char*)array, sizeof(array));
or
theFunction((insigned char*)array, SIZE_OF_ARRAY * sizeof(*array));
or
theFunction((insigned char*)array, SIZE_OF_ARRAY * sizeof(int));
All of which will pass all of the data to theFunction(), but whether than makes any sense will depend on what theFunction() does.

Resources