Why do casting a hex int to a char* prints it backwards? - c

I thought I understood how memory works until I run this code, is memory backwards ? or I'm missing something ?
Code:
#include <stdio.h>
int main()
{
int a = 0x12345678;
char *c = (char *)&a;
for (int i = 0; i < 4; i++)
{
printf("c[%d]=%x \n", i, *(c + i));
}
return 0;
}
Output:
c[0]=78
c[1]=56
c[2]=34
c[3]=12

What you have just done is demonstrate which "endian" your computer's architecture is using (i.e., your computer uses "little endian", not "big endian").
If your computer's architecture had instead been "big endian", then your output would instead have been this:
c[0] = 12
c[1] = 34
c[2] = 56
c[3] = 78
You may want to read this for more information: https://en.wikipedia.org/wiki/Endianness

It is because of the endianness of your machine. Check this article
A big-endian system stores the most significant byte of a word at the
smallest memory address and the least significant byte at the largest.
A little-endian system, in contrast, stores the least-significant byte
at the smallest address.
Your machine is little-endian.

Related

memcpy long long int (casting to char*) into char array

I was trying to split a long long into 8 character. Which the first 8 bits represent the first character while the next represent the second...etc.
However, I was using two methods. First , I shift and cast the type and it went well.
But I've failed when using memcpy. The result would be reversed...(which the first 8 bits become the last character). Shouldn't the memory be consecutive and in the same order? Or was I messing something up...
void num_to_str(){
char str[100005] = {0};
unsigned long long int ans = 0;
scanf("%llu" , &ans);
for(int j = 0; j < 8; j++){
str[8 * i + j] = (unsigned char)(ans >> (56 - 8 * j));
}
printf("%s\n", str);
return;
}
This work great:
input : 8102661169684245760
output : program
However, the following doesn't act as I expected.
void num_to_str(){
char str[100005] = {0};
unsigned long long int ans = 0;
scanf("%llu" , &ans);
memcpy(str , (char *)&ans , 8);
for(int i = 0; i < 8; i++)
printf("%c", str[i]);
return;
}
This work unexpectedly:
input : 8102661169684245760
output : margorp
PS:I couldn't even use printf("%s" , str) or puts(str)
I assume that the first character was stored as '\0'
I am a beginner, so I'll be grateful if someone can help me out
The order of bytes within a binary representation of a number within an encoding scheme is called endianness.
In a big-endian system bytes are ordered from the most significant byte to the least significant.
In a little-endian system bytes are ordered from the least significant byte to the most significant one.
There are other endianness, but they are considered esoteric nowadays so you won't find them in practice.
If you run your program on a little endian system, (e.g. x86) you get your results.
You can read more:
https://en.wikipedia.org/wiki/Endianness
You may think why would anyone sane design and use a little endian system where bytes are reversed from how we humans are used (we use big endian for ordering digits when we write). But there are advantages. You can read some here: The reason behind endianness?

C - access memory directly using address?

I've been lightly studying C for a few weeks now with some book.
int main(void)
{
float num = 3.15;
int *ptr = (int *)&num; //so I can use line 8 and 10
for (int i = 0; i < 32; i++)
{
if (!(i % 8) && (i / 8))
printf(" ");
printf("%d", *ptr >> (31 - i) & 1);
}
return 0;
}
output : 01000000 01001001 10011001 10011010
As you see 3.15 in single precision float is 01000000 01001001 10011001 10011010.
So let's say ptr points to address 0x1efb40.
Here are the questions:
As I understood in the book, first 8 bits of num data is stored in 0x1efb40, 2nd 8 bits in 0x1efb41, next 8 bits in 0x1efb42 and last 8 bits in 0x1efb43. Am I right?
If I'm right, is there any way I can directly access the 2nd 8 bits with hex address value 0x1efb41? Thereby can I change the data to something like 11111111?
The ordering of bytes within a datatype is known as endianness and is system specific. What you describe with the least significant byte (LSB) first is called little endian and is what you would find on x86 based processors.
As for accessing particular bytes of a representation, you can use a pointer to an unsigned char to point to the variable in question to view the specific bytes. For example:
float num = 3.15;
unsigned char *p = (unsigned char *)&num;
int i;
for (i=0; i<sizeof(num); i++) {
printf("byte %d = %02x\n", i, p[i]);
}
Note that this is only allowed to access bytes via a character pointer, not an int *, as the latter violates strict aliasing.
The code you wrote is not actually valid C. C has a rule called "strict aliasing," which states that if a region of memory contains a value of one type (i.e. float), it cannot be accessed as though it was another type (i.e. int). This rule has its origins in some performance optimizations that let the compiler generate faster code. I can't say it's an obvious rule, but it's the rule.
You can work around this by using union. If you make a union like union { float num, int numAsInt }, you can store a float and then read it as an integer. The result is unspecified. Alternatively, you are always permitted to access the bytes of a value as chars (just not anything larger). char is given special treatment (presumably to make it so you can copy a buffer of data as bytes, then cast it to your data's type and access it, which is something that happens a lot in low level code like network stacks).
Welcome to a fun corner of learning C. There's unspecified behavior and undefined behavior. Informally, unspecified behavior says "we won't say what happens, but it will be reasonable." The C spec will not say what order the bytes are in. But it will say that you will get some bytes. Undefined behavior is nastier. Undefined behavior says anything can happen, ranging from compiler errors to exceptions at runtime, to absolutely nothing at all (making you think your code is valid when it is not).
As for the values, dbush points out in his answer that the order of the bytes is defined by the platform you are on. You are seeing a "little endian" representation of a IEE754 floating point number. On other platforms, it may be different.
Union punning is much safer:
#include <stdio.h>
typedef union
{
unsigned char uc[sizeof(double)];
float f;
double d;
}u_t;
void print(u_t u, size_t size, int endianess)
{
size_t start = 0;
int increment = 1;
if(endianess)
{
start = size - 1;
increment = -1;
}
for(size_t index = 0; index < size; index++)
{
printf("%hhx ", u.uc[start]);
start += increment;
}
printf("\n");
}
int main(void)
{
u_t u;
u.f = 3.15f;
print(u, sizeof(float),0);
print(u, sizeof(float),1);
u.d = 3.15;
print(u, sizeof(double),0);
print(u, sizeof(double),1);
return 0;
}
you can test it yourself: https://ideone.com/7ABZaj

Output of the following C code

What will be the output of the following C code. Assuming it runs on Little endian machine, where short int takes 2 Bytes and char takes 1 Byte.
#include<stdio.h>
int main() {
short int c[5];
int i = 0;
for(i = 0; i < 5; i++)
c[i] = 400 + i;
char *b = (char *)c;
printf("%d", *(b+8));
return 0;
}
In my machine it gave
-108
I don't know if my machine is Little endian or big endian. I found somewhere that it should give
148
as the output. Because low order 8 bits of 404(i.e. element c[4]) is 148. But I think that due to "%d", it should read 2 Bytes from memory starting from the address of c[4].
The code gives different outputs on different computers because on some platforms the char type is signed by default and on others it's unsigned by default. That has nothing to do with endianness. Try this:
char *b = (char *)c;
printf("%d\n", (unsigned char)*(b+8)); // always prints 148
printf("%d\n", (signed char)*(b+8)); // always prints -108 (=-256 +148)
The default value is dependent on the platform and compiler settings. You can control the default behavior with GCC options -fsigned-char and -funsigned-char.
c[4] stores 404. In a two-byte little-endian representation, that means two bytes of 0x94 0x01, or (in decimal) 148 1.
b+8 addresses the memory of c[4]. b is a pointer to char, so the 8 means adding 8 bytes (which is 4 two-byte shorts). In other words, b+8 points to the first byte of c[4], which contains 148.
*(b+8) (which could also be written as b[8]) dereferences the pointer and thus gives you the value 148 as a char. What this does is implementation-defined: On many common platforms char is a signed type (with a range of -128 .. 127), so it can't actually be 148. But if it is an unsigned type (with a range of 0 .. 255), then 148 is fine.
The bit pattern for 148 in binary is 10010100. Interpreting this as a two's complement number gives you -108.
This char value (of either 148 or -108) is then automatically converted to int because it appears in the argument list of a variable-argument function (printf). This doesn't change the value.
Finally, "%d" tells printf to take the int argument and format it as a decimal number.
So, to recap: Assuming you have a machine where
a byte is 8 bits
negative numbers use two's complement
short int is 2 bytes
... then this program will output either -108 (if char is a signed type) or 148 (if char is an unsigned type).
To see what sizes types have in your system:
printf("char = %u\n", sizeof(char));
printf("short = %u\n", sizeof(short));
printf("int = %u\n", sizeof(int));
printf("long = %u\n", sizeof(long));
printf("long long = %u\n", sizeof(long long));
Change the lines in your program
unsigned char *b = (unsigned char *)c;
printf("%d\n", *(b + 8));
And simple test (I know that it is not guaranteed but all C compilers I know do it this way and I do not care about old CDC or UNISYS machines which had different addresses and pointers to different types of data
printf(" endianes test: %s\n", (*b + (unsigned)*(b + 1) * 0x100) == 400? "little" : "big");
Another remark: it is only because in your program c[0] == 400

Casting uint8_t array into uint16_t value in C

I'm trying to convert a 2-byte array into a single 16-bit value. For some reason, when I cast the array as a 16-bit pointer and then dereference it, the byte ordering of the value gets swapped.
For example,
#include <stdint.h>
#include <stdio.h>
main()
{
uint8_t a[2] = {0x15, 0xaa};
uint16_t b = *(uint16_t*)a;
printf("%x\n", (unsigned int)b);
return 0;
}
prints aa15 instead of 15aa (which is what I would expect).
What's the reason behind this, and is there an easy fix?
I'm aware that I can do something like uint16_t b = a[0] << 8 | a[1]; (which does work just fine), but I feel like this problem should be easily solvable with casting and I'm not sure what's causing the issue here.
As mentioned in the comments, this is due to endianness.
Your machine is little-endian, which (among other things) means that multi-byte integer values have the least significant byte first.
If you compiled and ran this code on a big-endian machine (ex. a Sun), you would get the result you expect.
Since your array is set up as big-endian, which also happens to be network byte order, you could get around this by using ntohs and htons. These functions convert a 16-bit value from network byte order (big endian) to the host's byte order and vice versa:
uint16_t b = ntohs(*(uint16_t*)a);
There are similar functions called ntohl and htonl that work on 32-bit values.
This is because of the endianess of your machine.
In order to make your code independent of the machine consider the following function:
#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1
int endian() {
int i = 1;
char *p = (char *)&i;
if (p[0] == 1)
return LITTLE_ENDIAN;
else
return BIG_ENDIAN;
}
So for each case you can choose which operation to apply.
You cannot do anything like *(uint16_t*)a because of the strict aliasing rule. Even if code appears to work for now, it may break later in a different compiler version.
A correct version of the code could be:
b = ((uint16_t)a[0] << CHAR_BIT) + a[1];
The version suggested in your question involving a[0] << 8 is incorrect because on a system with 16-bit int, this may cause signed integer overflow: a[0] promotes to int, and << 8 means * 256.
This might help to visualize things. When you create the array you have two bytes in order. When you print it you get the human readable hex value which is the opposite of the little endian way it was stored. The value 1 in little endian as a uint16_t type is stored as follows where a0 is a lower address than a1...
a0 a1
|10000000|00000000
Note, the least significant byte is first, but when we print the value in hex it the least significant byte appears on the right which is what we normally expect on any machine.
This program prints a little endian and big endian 1 in binary starting from least significant byte...
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <arpa/inet.h>
void print_bin(uint64_t num, size_t bytes) {
int i = 0;
for(i = bytes * 8; i > 0; i--) {
(i % 8 == 0) ? printf("|") : 1;
(num & 1) ? printf("1") : printf("0");
num >>= 1;
}
printf("\n");
}
int main(void) {
uint8_t a[2] = {0x15, 0xaa};
uint16_t b = *(uint16_t*)a;
uint16_t le = 1;
uint16_t be = htons(le);
printf("Little Endian 1\n");
print_bin(le, 2);
printf("Big Endian 1 on little endian machine\n");
print_bin(be, 2);
printf("0xaa15 as little endian\n");
print_bin(b, 2);
return 0;
}
This is the output (this is Least significant byte first)
Little Endian 1
|10000000|00000000
Big Endian 1 on little endian machine
|00000000|10000000
0xaa15 as little endian
|10101000|01010101

Treating a character array as an integer - Learn C the Hard Way Extra credit

In Zed Shaw's "Learn C the Hard Way", exercise 9 (http://c.learncodethehardway.org/book/ex9.html) there is an extra credit question that I find interesting. He defines a 4-character array and asks the reader to figure out how to use the array as a 4-byte integer.
At this point I know just enough to be dangerous, and I was thinking the answer is something along these lines:
#include <stdio.h>
int main(int argc, char *argv[])
{
char name[4] = {'A'};
int *name_int;
name_int = &name;
printf("%d", *name_int);
return 0;
}
My thoughts being that if I created an int pointer with a value being the address of the array that the int type would use the byte of data in that address, followed by the next 3 bytes of data available. In my limited understanding, I am under the impression that both an int and an array would use memory in the same way: starting at an arbitrary memory address than using the next address in sequence, and so on.
However, the output of this isn't what I expected: I get the ascii value of 'A'. Which to me seems to indicate that my solution is incorrect, my understanding how memory is handled is incorrect, or both.
How can this little hack be accomplished and where is it I am going wrong? I am hoping to walk away from this with a better understanding of how pointers and references work, and how memory is stored and used.
Thank you!
You are running into little-endian vs big-endian representation of numbers.
Let's take a look at the values of 4-btyes used to represent a 4-byte integer.
+----+----+----+----+
| N1 | N2 | N3 | N4 |
+----+----+----+----+
In a big-endian representation, these 4 bytes represent:
N1*2^24 + N2*2^16 + N3*2^8 + N4
In a little-endian representation, those 4 bytes represent:
N1 + N2*2^8 + N3*2^16 + N4*2^24
In your case.
N1 = 'A' (65 decimal)
N2 = 0
N3 = 0
N4 = 0
Since the value of integer you are getting is 65, you have a little endian representation. If you want to treat those numbers like a big-endian representation, you can use the following:
#include <stdio.h>
int main(int argc, char *argv[])
{
int i;
char nameString[4] = {'A'};
int name = 0;
for ( i = 0; i < 4; ++i )
{
name = (name << 8) + nameString[i];
}
printf("%d\n", name);
printf("%X\n", name);
return 0;
}
The output I get with the above code:
1090519040
41000000
You may also try the function memcpy().
Use a char array as a source and an unassigned int variable as the destination.

Resources