why scanf can change other variable which is not argument? - c

Here is code
#include <stdio.h>
int main(){
unsigned char mem[32];
int i,j;
for(i=0;i<32;i++){
unsigned char a[8];
scanf("%s",a);
for(j = 0;j<8;j++){
mem[i] <<=1;
mem[i] |= a[j] == '0' ? 0 : 1;
}
}
...
}
Input is a number in binary representation. I want to read them and store them into unsigned char array. When i equals 0, mem[0] = 0x3E. But when i equals 1, mem[0] will change to 0x0 as soon as scanf execute. And other inputs is fine. I have no idea about it. Input as follow
00111110
10100000
01010000
11100000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00111111
10000000
00000010
11000010
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
11111111
10001001

You are invoking undefined behavior by having scanf() write out-of-bounds of the array a.
8-character strings like 00111110 will occupy 9 bytes of the memory including the terminating null-character, so you have to allocate enough buffer.
Also you should limit the number of characters to read to prevent buffer overrun.
Another point is that you should check if scanf() succeeded to read what is expected.
There also is another undefined behavior: you used values of mem[i], which is uninitialized non-static local variable. Such values are indeterminate and mustn't be used in calculations.
In conclusion, the part
unsigned char a[8];
scanf("%s",a);
should be
char a[9];
if (scanf("%8s",a) != 1) {
fputs("read error\n", stderr);
return 1;
}
mem[i] = 0;
Also note that I used char instead of unsigned char because %s format specifier expects char* and there aren't seem any code that should require unsigned char instead of char.

a is too short it has to be char a[9] to accommodate null terminating character. Also use scanf("%8s",a);

Related

getting values of void pointer while only knowing the size of each element

ill start by saying ive seen a bunch of posts with similar titles but non focus on my question
ive been tasked to make a function that receives a void* arr, unsigned int sizeofArray and unsigned int sizeofElement
i managed to iterate through the array with no problem, however when i try to print out the values or do anything with them i seem to get garbage unless i specify the type of them beforehand
this is my function:
void MemoryContent(void* arr, unsigned int sizeRe, unsigned int sizeUnit)
{
int sizeArr = sizeRe/sizeUnit;
for (int i = 0; i < sizeArr ; i++)
{
printf("%d\n",arr); // this one prints garbage
printf("%d\n",*(int*)arr); // this one prints expected values given the array is of int*
arr = arr + sizeUnit;
}
}
the output of this with the following array(int arr[] = {1, 2, 4, 8, 16, 32, -1};) is:
-13296 1
-13292 2
-13288 4
-13284 8
-13280 16
-13276 32
-13272 -1
i realize i have to specify somehow the type. while the printf wont actually be used as i need the binary representation of whatever value is in there (already taken care of in a different function) im still not sure how to get the actual value without casting while knowing the size of the element
any explanation would be highly appreciated!
note: the compiler used is gcc so pointer arithmetics are allowed as used
edit for clarification:
the output after formating and all that should look like this for the given array of previous example
00000000 00000000 00000000 00000001 0x00000001
00000000 00000000 00000000 00000010 0x00000002
00000000 00000000 00000000 00000100 0x00000004
00000000 00000000 00000000 00001000 0x00000008
00000000 00000000 00000000 00010000 0x00000010
00000000 00000000 00000000 00100000 0x00000020
11111111 11111111 11111111 11111111 0xFFFFFFFF
getting values of void pointer getting values of void pointer while only knowing the size of each element
Not possible getting values of void pointer while only knowing the size of each element.
Say the size is 4. Is the element an int32_t, uint32_t, float, bool, some struct, or enum, a pointer, etc? Are any of the bits padding? The proper interpretation of the bits requires more than only knowing the size.
Code could print out the bits at void *ptr and leave the interpretation to the user.
unsigned char bytes[sizeUnit];
memcpy(bytes, ptr, sizeUnit);
for (size_t i = 0; i<sizeof bytes; i++) {
printf(" %02X", bytes[i]);
}
Simplifications exist.
OP's code void* arr, ... arr = arr + sizeUnit; is not portable code as adding to a void * is not defined by the C standard. Some compilers do allow it though, akin to as if the pointer was a char pointer.

Polynomial Hashing vs Cyclic Polynomial shifting for strings

I am using this function for cyclic shift:
int hashcyclic(char *p, int len)
{
unsigned int h = 0;
int i;
for (i = 0; i < len; i++)
{
h = (h << 5) | (h >> 27);
h += (unsigned int)p[i];
}
return h%TABLESIZE;
}
On a text file with around 20K lines (one word/line) total amount of collisions is 45187. On a text file with 40K+ lines (again, one word/line) there are 12922252 (!) collisions with the same algorithm.
With polynomial hashing:
int hashpoly(char *K)
{
int h = 0, a = 33;
for (; *K != '\0'; K++)
h = (a * h + *K) % TABLESIZE;
return h;
}
Now I'm getting around 25K collisions on the 20K word file and 901K collisions on the 40K word file(almost 12 times less than the cyclic shift).
My question is, does this make sense or is one of my implementations messed up? I was expecting cyclic to be the fastest for my strings (the 40K word file is a series of 8 letter words seperated by newline) but polynomial faces significantly less collisions.
int HashInsertPoly(Table T, KeyType K, InfoType I)
{
int i;
int ProbeDecrement;
i = hashpoly(K);
ProbeDecrement = p(K);
while (T[i].Key[0] != EmptyKey)
{
totalcol++;
T[i].Info.col++;
i -= ProbeDecrement;
if (i < 0)
i += TABLESIZE;
}
strcpy(T[i].Key, K);
insertions++;
/*T[i].Info = I;*/
return i;
}
The same HashInsert function applies to the hash with cyclic shift, except now I call hashcyclic instead of hashpoly
My hunch is that the variation in plain text words isn't high, and so the cyclic hash isn't chaotic enough.
Let's look at two strings "cat" and "dog".
cat
c 01100011
a 01100001
t 01110100
h starts at
00000000 00000000 00000000 01100011 (c)
and is then cycled to
00000000 00000000 00001100 01100000
then we add `a` to get
00000000 00000000 00001100 01100000
+ 01100001
= 00000000 00000000 00001100 11000001
which is then cycled to
00000000 00000001 10011000 00100000
then we add `t` to get
00000000 00000001 10011000 00100000
+ 01110100
= 00000000 00000001 10011000 10010100
we then return this number mod 41893 for 20810
Similarly, for dog
d 01100100
o 01101111
g 01100111
start:
00000000 00000000 00000000 01100100 (d)
cycled and added o:
00000000 00000000 00001100 11101111
cycled and added t:
00000000 00000001 10011110 01000111
ends up at 22269
Because the ASCII range is small, and the cycle algorithm uses the entire space of the unsigned int, it takes long strings to really push the hash into a completely different space. Especially the last character, which really dominates the final modulus operation.
Another way of looking at it: there's very little interaction with a 7-bit ASCII character and the previous 7-bit ASCII character after you shift 5 of those bits away and replace them with 0s, especially for shorter words.
Since the polynomial hash only uses the table size, it's chaotic "faster", even for smaller strings. It doesn't have to fill a whole int before it starts being really chaotic. A single ASCII character is much larger the table size.
That's my guess, anyway. I'd confirm this by checking to see which strings collide. My guess is strings of similar length are colliding the most with the cycle algorithm.

How does Pointer Arithmetic work after Pointer Casting?

int main() {
short int a[4] = {1,1, [3] = 1};
int *p = (int*)a;
printf("p: %p %d \n ", p, *p);
printf("p+1: %p %d\n", (p +1), *(p+1));
}
why does *p = 65537 and *(p+1) = 65536?
Well, to understand why *P is 65537 and *(p+1) is 65536 lets take a look at the memory:
00000001 00000000 | 00000001 00000000 | 00000000 00000000 | 00000001 00000000
I've split a byte by a space and a single short int by a |. Now we cast the ptr to a int* and it now takes four bytes instead of two:
00000001 00000000 00000001 00000000 | 00000000 00000000 00000001 00000000
If you input those binaries into your calculator and let it show you the decimal representation you'd exactly get those numbers. (Thats little-endian however, so the rightmost-byte is the big end which you'd input first into your calculator)

Pointer dereferencing from a (char *) to (int * ) not understood in this example

I was taking practice tests on C on a website , where i happened to see this question.
My Doubt is explained in comments , so please read them.
#include<stdio.h>
int main()
{
int arr[3] = {2, 3, 4}; // its assumed to be stored in little-endian i.e;
// 2 = 00000010 00000000 00000000 00000000
// 3 = 00000011 00000000 00000000 00000000
// 4 = 00000100 00000000 00000000 00000000
char *p;
p = arr;
p = (char*)((int*)(p));
printf("%d ", *p);
p = (int*)(p+1); // This casting is expected to convert char pointer p
// to an int pointer , thus value at p ,now is assumed
// to be equal to 00000000 00000000 00000000 00000011
// but, the output was : 0 . As ,per my assumption it
// should be : 2^24+2^25 = 50331648 ,Please Clarify
// if my assumption is Wrong and explain Why?
printf("%d\n", *p);
return 0;
}
If you would cast p back to int*, then the int value would be:
00000000 00000000 00000000 00000011
where the last byte is the first byte of your second array element. By doing p+1, you're skipping the least signigicant byte of the first element.
Remember that p remains a char pointer, so assigning an int* to it will not change it's type.
When you printf the char at p+1, you are printing the value of the second byte, which is 0.
p = (char*)((int*)(p));
// till now the pointer p is type casted to store the variable of type character.
printf("%d, ", *p); // %d means integer value so value at first address i.e. 2 will be printed.
p = (int*)(p+1); // here p is still of type character as type casted in step 1 so p(i.e address) and plus 1 will increase only by one byte so
Assuming that integer requires 2 bytes of storage
the integer array will be stored in memory as
value 2 3 4
address 00000010 00000000 00000011 00000000 00000100 00000000
pointer p+1
so p+1 points to that location which is unfilled as during intialization 2,3,4 were stored in variable of type integer(2 bytes).
so p+1 will point to 00000000.
(int*)p+1 // p+1 is type casted again to integer
printf("%d", *p); // this will print 0 as output as by default integer contains 0 as value.
Remember p is still a char-pointer. So *p fetches a char value from it. The char value is then promoted to an int when passed as an argument to a variadic function (like printf).

Wrong number produced when memcpy-ing data into an integer?

I have a char buffer like this
char *buff = "aaaa0006france";
I want to extract the bytes 4 to 7 and store it in an int.
int i;
memcpy(&i, buff+4, 4);
printf("%d ", i);
But it prints junk values.
What is wrong with this?
The string
0006
does not have the same binary representation as the integer 6. Instead, its bit representation is as four ASCII characters representing the glyph 0, the glyph 0, the glyph 0, then the glyph 6. This has hex representation
0x30303036
If you try blindly reinterpreting these bits as a number on a little-endian system, you get back 808,464,438. On a big-endian system, you'd get 909,127,728.
If you want to convert a substring of your string into a number, you will need to instead look for a function that converts a string of text into a number. You might want to try something like this:
char digits[5];
/* Copy over the digits in question. */
memcpy(digits, buff + 4, 4);
digits[4] = '\0'; /* Make sure it's null-terminated! */
/* Convert the string to a number. */
int i = strtol(digits + 4, NULL, 10);
This uses the strtol function, which converts a text string into a number, to explicitly convert the text to an integer.
Hope this helps!
Here you need to note down two things
How the characters are stored
Endianess of the system
Each characters (Alphabhets, numbers or special characters) are stored as 7 bit ASCII values. While doing memcpy of the string(array of characters) "0006" to a 4bytes int variable, we have to give address of string as source and address of int as destination like below.
char a[] = "0006";
int b = 0, c = 6;
memcpy(&b, a, 4);
Values of a and b are stored as below.
a 00110110 00110000 00110000 00110000
b 00000000 00000000 00000000 00000000
c 00000000 00000000 00000000 00000110
MSB LSB
Because ASCII value of 0 character is 48 and 6 character is 54. Now memcpy will try to copy whatever value present in the a to b. After memcpy value of b will be as below
a 00110110 00110000 00110000 00110000
b 00110110 00110000 00110000 00110000
c 00000000 00000000 00000000 00000110
MSB LSB
Next is endianess. Now consider we are keeping the value 0006 to the character buffer in some other way like a[0] = 0; a[1] = 0; a[2]=0; a[3] = 6; now if we do memcpy, we will the get the value as 100663296(0x6000000) not 6 if it is little endian machine. In big endian machine you will get the value as 6 only.
c 00000110 00000000 00000000 00000000
b 00000110 00000000 00000000 00000000
c 00000000 00000000 00000000 00000110
MSB LSB
So these two problems we need to consider while writing a function which converts number charters to integer value. Simple solution for these problem is to make use of existing system api atoi.
the below code might help you...
#include <stdio.h>
int main()
{
char *buff = "aaaa0006france";
char digits[5];
memcpy(digits, buff + 4, 4);
digits[4] = '\0';
int a = atoi(digits);
printf("int : %d", a);
return 0;
}

Resources