Binary representation of an unsigned long long - c

I am trying to get the binary form of an unsigned long long and store each bit of it in an array.
I have an input file like this:
0000000000000000 0000000000000000
FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF
3000000000000000 1000000000000001
where each entry is a 64-bit integer represented in hex. I am using an unsigned long long to hold this value then iterating over the bits and attempting to store them in an array, but some of the arrays have bits in the wrong position.
Here is what I have:
char key_in[17];
char plaintext_in[17];
//64-bit long variables to hold the 64-bit hex values in the input file
unsigned long long key, plaintext;
//I read an entry from the file with fscanf
fscanf(infile,"%s %s",&key_in, &plaintext_in)
//convert the numbers from hex to unsigned long long with strtoull
key = strtoull(key_in, NULL, 16);
plaintext = strtoull(plaintext_in, NULL, 16);
//initialize arrays with 64 positions that will hold the
//binary representation of the key and plaintext
int key_arr[64];
int pt_arr[64];
//fill the arrays with the binary representations
//of the plaintext and the key
int64_to_bin_array(key, key_arr, 64);
int64_to_bin_array(plaintext, pt_arr, 64);
//print both arrays
printArray(key_arr, 64);
printArray(pt_arr, 64);
here are the functions I created int64_to_bin_array and printArray:
/* Converts from an unsigned long long into an array of
integers that form the binary representation of a */
void int64_to_bin_array(unsigned long long a, int *b, int length)
{
int i;
for(i = 0; i < length; i++)
{
*(b+i) = (a >> i) & 1; //store the ith bit in b[i]
}
}
/* prints a one-dimensional array given
a pointer to it, and its length */
void printArray(int *arr, int length)
{
int i;
for(i = 0; i < length; i++)
{
printf("%d ", *(arr + i));
}
printf("\n\n");
}
When I print the array for the third input however, I receive an incorrect result:
input (in hex):
1. 3000000000000000 2. 1000000000000001
output (in binary):
1 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00001100
2 10000000 00000000 00000000 00000000 00000000 00000000 00000000 00001000
Can anyone see where I have made a mistake?
EDIT
I get the correct output after both reading and printing in reverse, but my problem is I need the array to have its most significant byte first so I can manipulate it. Any ideas how that can be done? Would I have to reassign it to a new array and copy the elements in reverse?

Try reading it the other way around. Let's take the last octet:
00001100 = 0x0C
00110000 = 0x30 <---
That corresponds yo your first first octet, 0x30.
For the second number:
00001000 = 0x08
00010000 = 0x10 <---
That corresponds to your first first octet, 0x10.
You'll probably get what you expect if you print it like this:
for(i = length - 1; i >= 0; i--)

Related

getting values of void pointer while only knowing the size of each element

ill start by saying ive seen a bunch of posts with similar titles but non focus on my question
ive been tasked to make a function that receives a void* arr, unsigned int sizeofArray and unsigned int sizeofElement
i managed to iterate through the array with no problem, however when i try to print out the values or do anything with them i seem to get garbage unless i specify the type of them beforehand
this is my function:
void MemoryContent(void* arr, unsigned int sizeRe, unsigned int sizeUnit)
{
int sizeArr = sizeRe/sizeUnit;
for (int i = 0; i < sizeArr ; i++)
{
printf("%d\n",arr); // this one prints garbage
printf("%d\n",*(int*)arr); // this one prints expected values given the array is of int*
arr = arr + sizeUnit;
}
}
the output of this with the following array(int arr[] = {1, 2, 4, 8, 16, 32, -1};) is:
-13296 1
-13292 2
-13288 4
-13284 8
-13280 16
-13276 32
-13272 -1
i realize i have to specify somehow the type. while the printf wont actually be used as i need the binary representation of whatever value is in there (already taken care of in a different function) im still not sure how to get the actual value without casting while knowing the size of the element
any explanation would be highly appreciated!
note: the compiler used is gcc so pointer arithmetics are allowed as used
edit for clarification:
the output after formating and all that should look like this for the given array of previous example
00000000 00000000 00000000 00000001 0x00000001
00000000 00000000 00000000 00000010 0x00000002
00000000 00000000 00000000 00000100 0x00000004
00000000 00000000 00000000 00001000 0x00000008
00000000 00000000 00000000 00010000 0x00000010
00000000 00000000 00000000 00100000 0x00000020
11111111 11111111 11111111 11111111 0xFFFFFFFF
getting values of void pointer getting values of void pointer while only knowing the size of each element
Not possible getting values of void pointer while only knowing the size of each element.
Say the size is 4. Is the element an int32_t, uint32_t, float, bool, some struct, or enum, a pointer, etc? Are any of the bits padding? The proper interpretation of the bits requires more than only knowing the size.
Code could print out the bits at void *ptr and leave the interpretation to the user.
unsigned char bytes[sizeUnit];
memcpy(bytes, ptr, sizeUnit);
for (size_t i = 0; i<sizeof bytes; i++) {
printf(" %02X", bytes[i]);
}
Simplifications exist.
OP's code void* arr, ... arr = arr + sizeUnit; is not portable code as adding to a void * is not defined by the C standard. Some compilers do allow it though, akin to as if the pointer was a char pointer.

get last 8 bits from int and covert to unsigned char

I am trying to get the last 8 bits from a int and copy them in an unsigned char
Ex: int 2 -> 00000000 00000000 00000000 00000011
I want to get the 00000011 and copy the bits in an unsigned char
Can anyone please help me with some information?
unsigned char bottomByte = num & 0xff;
This will mask your num with 1111 1111 - leaving only the bottom eight bits.
Code like
unsigned char bottomByte = num;
will also work fine, assuming num is positive (or unsigned) - it will leave only the bottom byte of num in the unsigned char. But for clarity you should stick to num & 0xff.

Polynomial Hashing vs Cyclic Polynomial shifting for strings

I am using this function for cyclic shift:
int hashcyclic(char *p, int len)
{
unsigned int h = 0;
int i;
for (i = 0; i < len; i++)
{
h = (h << 5) | (h >> 27);
h += (unsigned int)p[i];
}
return h%TABLESIZE;
}
On a text file with around 20K lines (one word/line) total amount of collisions is 45187. On a text file with 40K+ lines (again, one word/line) there are 12922252 (!) collisions with the same algorithm.
With polynomial hashing:
int hashpoly(char *K)
{
int h = 0, a = 33;
for (; *K != '\0'; K++)
h = (a * h + *K) % TABLESIZE;
return h;
}
Now I'm getting around 25K collisions on the 20K word file and 901K collisions on the 40K word file(almost 12 times less than the cyclic shift).
My question is, does this make sense or is one of my implementations messed up? I was expecting cyclic to be the fastest for my strings (the 40K word file is a series of 8 letter words seperated by newline) but polynomial faces significantly less collisions.
int HashInsertPoly(Table T, KeyType K, InfoType I)
{
int i;
int ProbeDecrement;
i = hashpoly(K);
ProbeDecrement = p(K);
while (T[i].Key[0] != EmptyKey)
{
totalcol++;
T[i].Info.col++;
i -= ProbeDecrement;
if (i < 0)
i += TABLESIZE;
}
strcpy(T[i].Key, K);
insertions++;
/*T[i].Info = I;*/
return i;
}
The same HashInsert function applies to the hash with cyclic shift, except now I call hashcyclic instead of hashpoly
My hunch is that the variation in plain text words isn't high, and so the cyclic hash isn't chaotic enough.
Let's look at two strings "cat" and "dog".
cat
c 01100011
a 01100001
t 01110100
h starts at
00000000 00000000 00000000 01100011 (c)
and is then cycled to
00000000 00000000 00001100 01100000
then we add `a` to get
00000000 00000000 00001100 01100000
+ 01100001
= 00000000 00000000 00001100 11000001
which is then cycled to
00000000 00000001 10011000 00100000
then we add `t` to get
00000000 00000001 10011000 00100000
+ 01110100
= 00000000 00000001 10011000 10010100
we then return this number mod 41893 for 20810
Similarly, for dog
d 01100100
o 01101111
g 01100111
start:
00000000 00000000 00000000 01100100 (d)
cycled and added o:
00000000 00000000 00001100 11101111
cycled and added t:
00000000 00000001 10011110 01000111
ends up at 22269
Because the ASCII range is small, and the cycle algorithm uses the entire space of the unsigned int, it takes long strings to really push the hash into a completely different space. Especially the last character, which really dominates the final modulus operation.
Another way of looking at it: there's very little interaction with a 7-bit ASCII character and the previous 7-bit ASCII character after you shift 5 of those bits away and replace them with 0s, especially for shorter words.
Since the polynomial hash only uses the table size, it's chaotic "faster", even for smaller strings. It doesn't have to fill a whole int before it starts being really chaotic. A single ASCII character is much larger the table size.
That's my guess, anyway. I'd confirm this by checking to see which strings collide. My guess is strings of similar length are colliding the most with the cycle algorithm.

C, Little and Big Endian confusion

I try to understand C programming memory Bytes order, but I'm confuse.
I try my app with some value on this site for my output verification : www.yolinux.com/TUTORIALS/Endian-Byte-Order.html
For the 64bits value I use in my C program:
volatile long long ll = (long long)1099511892096;
__mingw_printf("\tlong long, %u Bytes, %u bits,\t%lld to %lli, %lli, 0x%016llX\n", sizeof(long long), sizeof(long long)*8, LLONG_MIN, LLONG_MAX , ll, ll);
void printBits(size_t const size, void const * const ptr)
{
unsigned char *b = (unsigned char*) ptr;
unsigned char byte;
int i, j;
printf("\t");
for (i=size-1;i>=0;i--)
{
for (j=7;j>=0;j--)
{
byte = b[i] & (1<<j);
byte >>= j;
printf("%u", byte);
}
printf(" ");
}
puts("");
}
Out
long long, 8 Bytes, 64 bits, -9223372036854775808 to 9223372036854775807, 1099511892096, 0x0000010000040880
80 08 04 00 00 01 00 00 (Little-Endian)
10000000 00001000 00000100 00000000 00000000 00000001 00000000 00000000
00 00 01 00 00 04 08 80 (Big-Endian)
00000000 00000000 00000001 00000000 00000000 00000100 00001000 10000000
Tests
0x8008040000010000, 1000000000001000000001000000000000000000000000010000000000000000 // online website hex2bin conv.
1000000000001000000001000000000000000000000000010000000000000000 // my C app
0x8008040000010000, 1000010000001000000001000000000000000100000000010000000000000000 // yolinux.com
0x0000010000040880, 0000000000000000000000010000000000000000000001000000100010000000 //online website hex2bin conv., 1099511892096 ! OK
0000000000000000000000010000000000000000000001000000100010000000 // my C app, 1099511892096 ! OK
[Convert]::ToInt64("0000000000000000000000010000000000000000000001000000100010000000", 2) // using powershell for other verif., 1099511892096 ! OK
0x0000010000040880, 0000000000000000000000010000010000000000000001000000100010000100 // yolinux.com, 1116691761284 (from powershell bin conv.) ! BAD !
Problem
yolinux.com website announce 0x0000010000040880 for BIG ENDIAN ! But my computer use LITTLE ENDIAN I think (Intel proc.)
and I get same value 0x0000010000040880 from my C app and from another website hex2bin converter.
__mingw_printf(...0x%016llX...,...ll) also print 0x0000010000040880 as you can see.
Following yolinux website I have inverted my "(Little-Endian)" and "(Big-Endian)" labels in my output for the moment.
Also, the sign bit must be 0 for a positive number it's the case on my result but also yolinux result.(can not help me to be sure.)
If I correctly understand Endianness only Bytes are swapped not bits and my groups of bits seems to be correctly inverted.
It is simply an error on yolinux.com or is I missing a step about 64-bit numbers and C programming?
When you print some "multi-byte" integer using printf (and the correct format specifier) it doesn't matter whether the system is little or big endian. The result will be the same.
The difference between little and big endian is the order that multi-byte types are stored in memory. But once data is read from memory into the core processor, there is no difference.
This code shows how an integer (4 bytes) is placed in memory on my machine.
#include <stdio.h>
int main()
{
unsigned int u = 0x12345678;
printf("size of int is %zu\n", sizeof u);
printf("DEC: u=%u\n", u);
printf("HEX: u=0x%x\n", u);
printf("memory order:\n");
unsigned char * p = (unsigned char *)&u;
for(int i=0; i < sizeof u; ++i) printf("address %p holds %x\n", (void*)&p[i], p[i]);
return 0;
}
Output:
size of int is 4
DEC: u=305419896
HEX: u=0x12345678
memory order:
address 0x7ffddf2c263c holds 78
address 0x7ffddf2c263d holds 56
address 0x7ffddf2c263e holds 34
address 0x7ffddf2c263f holds 12
So I can see that I'm on a little endian machine as the LSB (least significant byte, i.e. 78) is stored on the lowest address.
Executing the same program on a big endian machine would (assuming same address) show:
size of int is 4
DEC: u=305419896
HEX: u=0x12345678
memory order:
address 0x7ffddf2c263c holds 12
address 0x7ffddf2c263d holds 34
address 0x7ffddf2c263e holds 56
address 0x7ffddf2c263f holds 78
Now it is the MSB (most significant byte, i.e. 12) that are stored on the lowest address.
The important thing to understand is that this only relates to "how multi-byte type are stored in memory". Once the integer is read from memory into a register inside the core, the register will hold the integer in the form 0x12345678 on both little and big endian machines.
There is only a single way to represent an integer in decimal, binary or hexadecimal format. For example, number 43981 is equal to 0xABCD when written as hexadecimal, or 0b1010101111001101 in binary. Any other value (0xCDAB, 0xDCBA or similar) represents a different number.
The way your compiler and cpu choose to store this value internally is irrelevant as far as C standard is concerned; the value could be stored as a 36-bit one's complement if you're particularly unlucky, as long as all operations mandated by the standard have equivalent effects.
You will rarely have to inspect your internal data representation when programming. Practically the only time when you care about endiannes is when working on a communication protocol, because then the binary format of the data must be precisely defined, but even then your code will not be different regardless of the architecture:
// input value is big endian, this is defined
// by the communication protocol
uint32_t parse_comm_value(const char * ptr)
{
// but bit shifts in C have the same
// meaning regardless of the endianness
// of your architecture
uint32_t result = 0;
result |= (*ptr++) << 24;
result |= (*ptr++) << 16;
result |= (*ptr++) << 8;
result |= (*ptr++);
return result;
}
Tl;dr calling a standard function like printf("0x%llx", number); always prints the correct value using the specified format. Inspecting the contents of memory by reading individual bytes gives you the representation of the data on your architecture.

Wrong number produced when memcpy-ing data into an integer?

I have a char buffer like this
char *buff = "aaaa0006france";
I want to extract the bytes 4 to 7 and store it in an int.
int i;
memcpy(&i, buff+4, 4);
printf("%d ", i);
But it prints junk values.
What is wrong with this?
The string
0006
does not have the same binary representation as the integer 6. Instead, its bit representation is as four ASCII characters representing the glyph 0, the glyph 0, the glyph 0, then the glyph 6. This has hex representation
0x30303036
If you try blindly reinterpreting these bits as a number on a little-endian system, you get back 808,464,438. On a big-endian system, you'd get 909,127,728.
If you want to convert a substring of your string into a number, you will need to instead look for a function that converts a string of text into a number. You might want to try something like this:
char digits[5];
/* Copy over the digits in question. */
memcpy(digits, buff + 4, 4);
digits[4] = '\0'; /* Make sure it's null-terminated! */
/* Convert the string to a number. */
int i = strtol(digits + 4, NULL, 10);
This uses the strtol function, which converts a text string into a number, to explicitly convert the text to an integer.
Hope this helps!
Here you need to note down two things
How the characters are stored
Endianess of the system
Each characters (Alphabhets, numbers or special characters) are stored as 7 bit ASCII values. While doing memcpy of the string(array of characters) "0006" to a 4bytes int variable, we have to give address of string as source and address of int as destination like below.
char a[] = "0006";
int b = 0, c = 6;
memcpy(&b, a, 4);
Values of a and b are stored as below.
a 00110110 00110000 00110000 00110000
b 00000000 00000000 00000000 00000000
c 00000000 00000000 00000000 00000110
MSB LSB
Because ASCII value of 0 character is 48 and 6 character is 54. Now memcpy will try to copy whatever value present in the a to b. After memcpy value of b will be as below
a 00110110 00110000 00110000 00110000
b 00110110 00110000 00110000 00110000
c 00000000 00000000 00000000 00000110
MSB LSB
Next is endianess. Now consider we are keeping the value 0006 to the character buffer in some other way like a[0] = 0; a[1] = 0; a[2]=0; a[3] = 6; now if we do memcpy, we will the get the value as 100663296(0x6000000) not 6 if it is little endian machine. In big endian machine you will get the value as 6 only.
c 00000110 00000000 00000000 00000000
b 00000110 00000000 00000000 00000000
c 00000000 00000000 00000000 00000110
MSB LSB
So these two problems we need to consider while writing a function which converts number charters to integer value. Simple solution for these problem is to make use of existing system api atoi.
the below code might help you...
#include <stdio.h>
int main()
{
char *buff = "aaaa0006france";
char digits[5];
memcpy(digits, buff + 4, 4);
digits[4] = '\0';
int a = atoi(digits);
printf("int : %d", a);
return 0;
}

Resources