Usage of uint_8, uint_16 and uint_32 - c

Please explain each case in detail, what is happening under the hood and why I am getting 55551 and -520103681 specifically.
typedef uint_8 BYTE;
BYTE arr[512];
fread(arr, 512, 1, infile);
printf("%i", arr[0]);
OUTPUT :255
typedef uint_16 BYTE;
BYTE arr[512];
fread(arr, 512, 1, infile);
printf("%i", arr[0]);
OUTPUT :55551
typedef uint_32 BYTE;
BYTE arr[512];
fread(arr, 512, 1, infile);
printf("%i", arr[0]);
OUTPUT :-520103681
I am reading from a file having first four bytes as 255 216 255 244.

In your 3 cases, before the printf statement, the 4 first bytes of the arr array (in hexadecimal format) are: FF D8 FF E0, which corresponds to 255 216 255 224.
Here are explanations of each case:
arr[0] has uint8_t type, so its 1-byte value is 0xFF, so printf("%i", arr[0]); prints 255, which corresponds to 0xFF in signed decimal integer format required by %i, integer size being 4 bytes.
arr[0] has uint16_t type, so its 2-bytes value is 0xFFD8, so printf("%i", arr[0]); prints 55551, which corresponds to 0xFFD8 in signed decimal integer format required by %i, integer size being 4 bytes. Note that 0xFFD8 is interpreted with little-endianness (the byte 0xD8 is the MSB).
arr[0] has uint32_t type, so its 4-bytes value is 0xFFD8FFE0, so printf("%i", arr[0]); prints -520103681, which corresponds to 0xFFD8FFE0 in signed decimal integer format required by %i, integer size being 4 bytes. Note that 0xFFD8FFE0 is interpreted with little-endianness (the byte 0xE0 is the MSB).
Note: I voluntarily changed "255 216 255 244" from your post into "255 216 255 224". I think you made a typo.

You seem to mixup a lot of misunderstandings. Three out of four lines in your examples are questionable. So let's disect them.
typedef uint_8 BYTE; why do you have this typedef and where is uint_8 coming from I suggest you just use uint8_t from stdint.h and for a minimal example you could skip your typedef to byte completly.
The documentation of fread tells us that:
the second parameter is the size of each element to be read
the third parameter is the number of elements to be read
There are other ways to get the values into memory and to make the program reproducable by copy and paste we can just enter the corresponding values to memory. If you have problems with fread that would be a different question.
So it would be one of those lines (your last value has to be 224 not 244 to get -520103681)
uint8_t arr[512] ={0xFF, 0xD8, 0xFF, 0xE0}; //{255, 216, 255, 224}
uint16_t arr[512] = {0xD8FF, 0xF4FF };//{255<<8 + 216, 255<<8 + 224} reversed because of the endianess
uint32_t arr[512] = {0xE0FFD8FF}; //{255<<24 + 216<<16 + 255<<8 + 224}
Now you can see that the arrays are of different size and 16/32 bit hardly qualify as a BYTE
In the last line you use printf() wrong. If you look up the output and length specifiers for printf() you can see that i is used for int (which probably is 32bit).
Basically you tell it read arr[0] (of whatever type) as signed int.
This results in the values you see above. The (nearly) correct specifiers would be %hhu for unsigned char, %hu for unsigned short and %u for unsigned int.
But as you use size defined variables it would be better to use inttypes.h and the specifiers :
PRIu8, PRIu16 and PRIu32 accordingly like this
printf("%"PRIu8,arr[0]);
putting them all together yields:
#include <stdio.h>
#include <inttypes.h>
int main(void)
{
uint32_t arr[512] = {0xE0FFD8FF};
printf("%"PRIu32,arr[0]);
return 0;
}
If you make eleminate all the problems in the code we get closer to the problem.
Problem 1 might have been that you forgot about endianess so you might expect the bytes in a different order.
Also if you use the signed specifier for printf and the MSB is 1 you get a negative value, that won't happen if you use the correct specifier for unsigned values.

Related

C structure with bits printing in hexadecimal

I have defined a structure as below
struct {
UCHAR DSatasetMGMT : 1;
UCHAR AtriburDeallocate : 1;
UCHAR Reserved6 : 6;
UCHAR Reserved7 : 7;
UCHAR DSatasetMGMTComply : 1;
}DatasetMGMTCMDSupport;
It is a 2 byte structure represented in bits. How should I print the whole 2 bytes of structure in hexadecimal. I tried
"DatasetMGMTCMDSupport : 0x%04X\n"
And
0x%04I64X\n
But not getting expected result.
I am getting 0x3DC18003 with 0x%04X\n while the correct data is 0x8003 "
I am using 64 bit windows system.
I need to know how to print 2 byte structure in hexadecimal.
Try using 0x%04hx\n. This tells printf to print out only the two bytes. You can read more about it here: https://en.wikipedia.org/wiki/Printf_format_string#Length_field
In contrast, the I64 in 0x%04I64X\n tells printf to print out a 64 bit integer, which is 8 bytes, and 0x%04X\n tells it to print out a default-size integer, which might be 4 bytes on your system.
The width 04 specifies a minimum width. Since the value needs more digits, they are printed.
From a C Standard point of view, you cannot rely on a particular layout of bit fields. Hence, any solution will at best have implementation-defined behaviour.
That being said, your expected output can be obtained. The structure fits in 2 bytes and if you print sizeof(DatasetMGMTCMDSupport) it should give the result 2.
The byte representation of DatasetMGMTCMDSupport can be printed and that is what you were attempting, but since your system has integer size 4, two additional bytes are included. To fix this, the following can be done:
#include <stdint.h>
#include <string.h>
#include <stdio.h>
...
uint16_t a;
memcpy(&a, &DatasetMGMTCMDSupport, sizeof(a));
printf("0x%04X", (unsigned)a);
This copies the 2 bytes of DatasetMGMTCMDSupport into a 2-byte integer variable and prints the hexadecimal representation of those 2 bytes only. If you are on a little-endian system, you should see 0x8003.
A more general approach would be to directly print the bytes of DatasetMGMTCMDSupport:
for(unsigned i = 0; i < sizeof(DatasetMGMTCMDSupport); i++)
{
printf("%02X", (unsigned)((unsigned char *)&DatasetMGMTCMDSupport)[i]);
}
This will most likely print 0380 (notice the byte order: first byte printed first).
To reverse the byte order is straightforward:
for(unsigned i = 0; i < sizeof(DatasetMGMTCMDSupport); i++)
{
printf("%02X", (unsigned)((unsigned char *)&DatasetMGMTCMDSupport)[sizeof(DatasetMGMTCMDSupport)-1-i]);
}
which should give 8003.

Char automatically converts to int (I guess)

I have following code
char temp[] = { 0xAE, 0xFF };
printf("%X\n", temp[0]);
Why output is FFFFFFAE, not just AE?
I tried
printf("%X\n", 0b10101110);
And output is correct: AE.
Suggestions?
The answer you're getting, FFFFFFAE, is a result of the char data type being signed. If you check the value, you'll notice that it's equal to -82, where -82 + 256 = 174, or 0xAE in hexadecimal.
The reason you get the correct output when you print 0b10101110 or even 174 is because you're using the literal values directly, whereas in your example you're first putting the 0xAE value in a signed char where the value is then being sort of "reinterpreted modulo 128", if you wanna think of it that way.
So in other words:
0 = 0 = 0x00
127 = 127 = 0x7F
128 = -128 = 0xFFFFFF80
129 = -127 = 0xFFFFFF81
174 = -82 = 0xFFFFFFAE
255 = -1 = 0xFFFFFFFF
256 = 0 = 0x00
To fix this "problem", you could declare the same array you initially did, just make sure to use an unsigned char type array and your values should print as you expect.
#include <stdio.h>
#include <stdlib.h>
int main()
{
unsigned char temp[] = { 0xAE, 0xFF };
printf("%X\n", temp[0]);
printf("%d\n\n", temp[0]);
printf("%X\n", temp[1]);
printf("%d\n\n", temp[1]);
return EXIT_SUCCESS;
}
Output:
AE
174
FF
255
https://linux.die.net/man/3/printf
According to the man page, %x or %X accept an unsigned integer. Thus it will read 4 bytes from the stack.
In any case, under most architectures you can't pass a parameter that is less then a word (i.e. int or long) in size, and in your case it will be converted to int.
In the first case, you're passing a char, so it will be casted to int. Both are signed, so a signed cast is performed, thus you see preceding FFs.
In your second example, you're actually passing an int all the way, so no cast is performed.
If you'd try:
printf("%X\n", (char) 0b10101110);
You'd see that FFFFFFAE will be printed.
When you pass a smaller than int data type (as char is) to a variadic function (as printf(3) is) the parameter is converted to int in case the parameter is signed and to unsigned int in the case it is unsigned. What is being done and you observe is a sign extension, as the most significative bit of the char variable is active, it is replicated to the thre bytes needed to complete an int.
To solve this and to have the data in 8 bits, you have two possibilities:
Allow your signed char to convert to an int (with sign extension) then mask the bits 8 and above.
printf("%X\n", (int) my_char & 0xff);
Declare your variable as unsigned, so it is promoted to an unsigned int.
unsigned char my_char;
...
printf("%X\n", my_char);
This code causes undefined behaviour. The argument to %X must have type unsigned int, but you supply char.
Undefined behaviour means that anything can happen; including, but not limited to, extra F's appearing in the output.

Output of the following C code

What will be the output of the following C code. Assuming it runs on Little endian machine, where short int takes 2 Bytes and char takes 1 Byte.
#include<stdio.h>
int main() {
short int c[5];
int i = 0;
for(i = 0; i < 5; i++)
c[i] = 400 + i;
char *b = (char *)c;
printf("%d", *(b+8));
return 0;
}
In my machine it gave
-108
I don't know if my machine is Little endian or big endian. I found somewhere that it should give
148
as the output. Because low order 8 bits of 404(i.e. element c[4]) is 148. But I think that due to "%d", it should read 2 Bytes from memory starting from the address of c[4].
The code gives different outputs on different computers because on some platforms the char type is signed by default and on others it's unsigned by default. That has nothing to do with endianness. Try this:
char *b = (char *)c;
printf("%d\n", (unsigned char)*(b+8)); // always prints 148
printf("%d\n", (signed char)*(b+8)); // always prints -108 (=-256 +148)
The default value is dependent on the platform and compiler settings. You can control the default behavior with GCC options -fsigned-char and -funsigned-char.
c[4] stores 404. In a two-byte little-endian representation, that means two bytes of 0x94 0x01, or (in decimal) 148 1.
b+8 addresses the memory of c[4]. b is a pointer to char, so the 8 means adding 8 bytes (which is 4 two-byte shorts). In other words, b+8 points to the first byte of c[4], which contains 148.
*(b+8) (which could also be written as b[8]) dereferences the pointer and thus gives you the value 148 as a char. What this does is implementation-defined: On many common platforms char is a signed type (with a range of -128 .. 127), so it can't actually be 148. But if it is an unsigned type (with a range of 0 .. 255), then 148 is fine.
The bit pattern for 148 in binary is 10010100. Interpreting this as a two's complement number gives you -108.
This char value (of either 148 or -108) is then automatically converted to int because it appears in the argument list of a variable-argument function (printf). This doesn't change the value.
Finally, "%d" tells printf to take the int argument and format it as a decimal number.
So, to recap: Assuming you have a machine where
a byte is 8 bits
negative numbers use two's complement
short int is 2 bytes
... then this program will output either -108 (if char is a signed type) or 148 (if char is an unsigned type).
To see what sizes types have in your system:
printf("char = %u\n", sizeof(char));
printf("short = %u\n", sizeof(short));
printf("int = %u\n", sizeof(int));
printf("long = %u\n", sizeof(long));
printf("long long = %u\n", sizeof(long long));
Change the lines in your program
unsigned char *b = (unsigned char *)c;
printf("%d\n", *(b + 8));
And simple test (I know that it is not guaranteed but all C compilers I know do it this way and I do not care about old CDC or UNISYS machines which had different addresses and pointers to different types of data
printf(" endianes test: %s\n", (*b + (unsigned)*(b + 1) * 0x100) == 400? "little" : "big");
Another remark: it is only because in your program c[0] == 400

declaring string using pointer to int

I am trying to initialize a string using pointer to int
#include <stdio.h>
int main()
{
int *ptr = "AAAA";
printf("%d\n",ptr[0]);
return 0;
}
the result of this code is 1094795585
could any body explain this behavior and why the code gave this answers ?
I am trying to initialize a string using pointer to int
The string literal "AAAA" is of type char[5], that is array of five elements of type char.
When you assign:
int *ptr = "AAAA";
you actually must use explicit cast (as types don't match):
int *ptr = (int *) "AAAA";
But, still it's potentially invalid, as int and char objects may have different alignment requirements. In other words:
alignof(char) != alignof(int)
may hold. Also, in this line:
printf("%d\n", ptr[0]);
you are invoking undefined behavior (so it might print "Hello from Mars" if compiler likes so), as ptr[0] dereferences ptr, thus violating strict aliasing rule.
Note that it is valid to make transition int * ---> char * and read object as char *, but not the opposite.
the result of this code is 1094795585
The result makes sense, but for that, you need to rewrite your program in valid form. It might look as:
#include <stdio.h>
#include <string.h>
union StringInt {
char s[sizeof("AAAA")];
int n[1];
};
int main(void)
{
union StringInt si;
strcpy(si.s, "AAAA");
printf("%d\n", si.n[0]);
return 0;
}
To decipher it, you need to make some assumptions, depending on your implementation. For instance, if
int type takes four bytes (i.e. sizeof(int) == 4)
CPU has little-endian byte ordering (though it's not really matter, since every letter is the same)
default character set is ASCII (the letter 'A' is represented as 0x41, that is 65 in decimal)
implementation uses two's complement representation of signed integers
then, you may deduce, that si.n[0] holds in memory:
0x41 0x41 0x41 0x41
that is in binary:
01000001 ...
The sign (most-significant) bit is unset, hence it is just equal to:
65 * 2^24 + 65 * 2^16 + 65 * 2^8 + 65 =
65 * (2^24 + 2^16 + 2^8 + 1) = 65 * 16843009 = 1094795585
1094795585 is correct.
'A' has the ASCII value 65, i.e. 0x41 in hexadecimal.
Four of them makes 0x41414141 which is equal to 1094795585 in decimal.
You got the value 65656565 by doing 65*100^0 + 65*100^1 + 65*100^2 + 65*100^3 but that's wrong since a byte1 can contain 256 different values, not 100.
So the correct calculation would be 65*256^0 + 65*256^1 + 65*256^2 + 65*256^3, which gives 1094795585.
It's easier to think of memory in hexadecimal because one hexadecimal digit directly corresponds to half a byte1, so two hex digits is one full byte1 (cf. 0x41). Whereas in decimal, 255 fits in a single byte1, but 256 does not.
1 assuming CHAR_BIT == 8
65656565 this is a wrong representation of the value of "AAAA" you are seprately representing each character and "AAAA" is stored as array.Its converting into 1094795585 because %d identifier prints decimal value. Run this in gdb with following command:
x/8xb (pointer) //this will show you the memory hex value
x/d (pointer) //this will show you the converted decimal value
#zenith gave you the answer you expected, but your code invokes UB. Anyway, you could demonstrate the same in an almost correct way :
#include <stdio.h>
int main()
{
int i, val;
char *pt = (char *) &val; // cast a pointer to any to a pointer to char : valid
for (i=0; i<sizeof(int); i++) pt[i] = 'A'; // assigning bytes of int : UB in general case
printf("%d 0x%x\n",val, val);
return 0;
}
Assigning bytes of an int is UB in the general case because C standard says that [for] signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. And a remark adds Some combinations of padding bits might generate trap representations, for example, if one padding
bit is a parity bit.
But in common architectures, there are no padding bits and all bits values correspond to valid numbers, so the operation is valid (but implementation dependant) on all common systems. It is still implementation dependant because size of int is not fixed by standard, nor is endianness.
So : on a 32 bit system using no padding bits, above code will produce
1094795585 0x41414141
indepentantly of endianness.

Unsigned Char pointing to unsigned integer

I don't understand why the following code prints out 7 2 3 0 I expected it to print out 1 9 7 1. Can anyone explain why it is printing 7230?:
unsigned int e = 197127;
unsigned char *f = (char *) &e;
printf("%ld\n", sizeof(e));
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d\n", *f);
Computers work with binary, not decimal, so 197127 is stored as a binary number and not a series of single digits separately in decimal
19712710 = 0003020716 = 0011 0000 0010 0000 01112
Suppose your system uses little endian, 0x00030207 would be stored in memory as 0x07 0x02 0x03 0x00 which is printed out as (7 2 3 0) as expected when you print out each byte
Because with your method you print out the internal representation of the unsigned and not its decimal representation.
Integers or any other data are represented as bytes internally. unsigned char is just another term for "byte" in this context. If you would have represented your integer as decimal inside a string
char E[] = "197127";
and then done an anologous walk throught the bytes, you would have seen the representation of the characters as numbers.
Binary representation of "197127" is "00110000001000000111".
The bytes looks like "00000111" (is 7 decimal), "00000010" (is 2), "0011" (is 3). the rest is 0.
Why did you expect 1 9 7 1? The hex representation of 197127 is 0x00030207, so on a little-endian architecture, the first byte will be 0x07, the second 0x02, the third 0x03, and the fourth 0x00, which is exactly what you're getting.
The value of e as 197127 is not a string representation. It is stored as a 16/32 bit integer (depending on platform). So, in memory, e is allocated, say 4 bytes on the stack, and would be represented as 0x30207 (hex) at that memory location. In binary, it would look like 110000001000000111. Note that the "endian" would actually backwards. See this link account endianess. So, when you point f to &e, you are referencing the 1st byte of the numeric value, If you want to represent a number as a string, you should have
char *e = "197127"
This has to do with the way the integer is stored, more specifically byte ordering. Your system happens to have little-endian byte ordering, i.e. the first byte of a multi byte integer is least significant, while the last byte is most significant.
You can try this:
printf("%d\n", 7 + (2 << 8) + (3 << 16) + (0 << 24));
This will print 197127.
Read more about byte order endianness here.
The byte layout for the unsigned integer 197127 is [0x07, 0x02, 0x03, 0x00], and your code prints the four bytes.
If you want the decimal digits, then you need to break the number down into digits:
int digits[100];
int c = 0;
while(e > 0) { digits[c++] = e % 10; e /= 10; }
while(c > 0) { printf("%u\n", digits[--c]); }
You know the type of int often take place four bytes. That means 197127 is presented as 00000000 00000011 00000010 00000111 in memory. From the result, your memory's address are Little-Endian. Which means, the low-byte 0000111 is allocated at low address, then 00000010 and 00000011, finally 00000000. So when you output f first as int, through type cast you obtain a 7. By f++, f points to 00000010, the output is 2. The rest could be deduced by analogy.
The underlying representation of the number e is in binary and if we convert the value to hex we can see that the value would be(assuming 32 bit unsigned int):
0x00030207
so when you iterate over the contents you are reading byte by byte through the *unsigned char **. Each byte contains two 4 bit hex digits and the byte order endiannes of the number is little endian since the least significant byte(0x07) is first and so in memory the contents are like so:
0x07020300
^ ^ ^ ^- Fourth byte
| | |-Third byte
| |-Second byte
|-First byte
Note that sizeof returns size_t and the correct format specifier is %zu, otherwise you have undefined behavior.
You also need to fix this line:
unsigned char *f = (char *) &e;
to:
unsigned char *f = (unsigned char *) &e;
^^^^^^^^
Because e is an integer value (probably 4 bytes) and not a string (1 byte per character).
To have the result you expect, you should change the declaration and assignment of e for :
unsigned char *e = "197127";
unsigned char *f = e;
Or, convert the integer value to a string (using sprintf()) and have f point to that instead :
char s[1000];
sprintf(s,"%d",e);
unsigned char *f = s;
Or, use mathematical operation to get single digit from your integer and print those out.
Or, ...

Resources