Split up two byte char into two single byte chars - c

I have a char of value say 0xB3, and I need to split this into two separate char's. So X = 0xB and Y = 0x3. I've tried the following code:
int main ()
{
char addr = 0xB3;
char *p = &addr;
printf ("%c, %c\n", p[0], p[1]); //This prints ?, Y
printf ("%X, %X\n", p[0], p[1]); //This prints FFFFFFB3, 59
return 0;
}
Just to clarify, I need to take any 2 byte char of value 00 to FF and split the first and second byte into separate char's. Thanks.

Straight from the Wikipedia:
#define HI_NIBBLE(b) (((b) >> 4) & 0x0F)
#define LO_NIBBLE(b) ((b) & 0x0F)
So HI_NIBBLE(addr) would be 0xB. However, 0x00 through 0xFF are not "double bytes". They're single-byte values. A single hex digit can take on 16 bytes, while a byte can take on 256 = 16² of them, so you need two hex digits to represent arbitrary byte values.

There's quite a few problems here, let's take a look at your code:
int main ()
{
char addr = 0xB3; <-- you're asigning 0xB3 in hex, which is (179 in dec) to addr
char *p = &addr; <-- you're assigning a pointer to point to addr
If addr were unsigned, it would now be set to 179, the extended ASCII of │ ( Box drawing character )
A char value can be -127 to +127 if it's signed, or 0 to 255 if it's unsigned. Here (according to your output) it's signed, so you're overflowing the char with that assignment.
printf ("%c, %c\n", p[0], p[1]); <-- print the char value of what p is pointing to
also, do some UB
printf ("%X, %X\n", p[0], p[1]); <-- print the hex value of what p is pointing to
also, do some UB
So the second part of your code here prints the char value of your overflowed addr var, which happens to print '?' for you. The hex value of addr is FFFFFFB3 indicating you have a negitive value (upper most bit is the signed bit).
This: p[0] is really an "add and deference" operator. Meaning that we're going to take the address of p, add 0 to it, then deference and look at the result:
p ---------+
V
------------------------------------------
| ptr(0xB3) | ? | ? | ... |
-------------------------------------------
0xbfd56c2b 0xbfd56c2C 0xbfd56c2d ...
When you do p[1] this goes one char or one byte past ptr and gives you that result. What's there? Don't know. That's out of your scope:
p+1 -------------------+
V
------------------------------------------
| ptr(0xB3) | ? | ? | ... |
-------------------------------------------
0xbfd56c2b 0xbfd56c2C 0xbfd56c2d ...
Y's ASCII value (in hex) is 0x59, so behind your pointer in memory was a Y. But it could have been anything, what is was going to do was undefined. A correct way to do this would be:
int main ()
{
unsigned char addr = 0xB3;
char low = addr & 0x0F;
char high = (addr >> 4) & 0x0F;
printf("%#x becomes %#x and %#x\n", addr, high, low);
return 0;
}
This works via:
0xB3 => 1011 0011 0xB3 >> 4 = 0000 1011
& 0000 1111 & 0000 1111
------------ -------------
0000 0011 => 3 low 0000 1011 => B high

Why do you need to pass by a pointer? just take the 4 relevant bits and shift the most significative when needed:
char lower = value & 0x0F;
char higher = (value >> 4) & 0x0F;
Then 0xB3 is a single byte, not two bytes. Since a hex digit can have 16 values two digits can store 16*16 = 256 values, which is how much you can store in a byte.

Ok so your trying to split 0xB3 into 0xB and 0x3, Just for future reference don't say 'byte chars', The 2 parts of a byte are commonly known as 'Nibbles', a byte is made up of 2 nibbles (which are made up of 4 bits).
If you didn't know, char refers to 1 byte.
So heres the problems with your code:
char addr = 0xB3; <---- Creates single byte with value 0xB3 - Good
char *p = &addr; <---- Creates pointer pointing to 0xB3 - Good
printf ("%c, %c\n", p[0], p[1]); <---- p[0], p[1] - Bad
printf ("%X, %X\n", p[0], p[1]); <---- p[0], p[1] - Bad
Ok so when your referring to p[0] and p[1] your telling your system that the pointer p is pointing to an array of chars (p[0] would refer to 0xB3 but p[1] would be going to the next byte in memory)
Example : This is something your system memory would look like (But with 8 byte pointers)
Integer Values Area Pointers Area
0x01 0x02 0x03 0x04 0x05 0x06 0x12 0x13 0x14 0x15 0x16
----------------------------- ------------------------
.... .... 0xB3 0x59 .... .... .... .... 0x03 .... ....
----------------------------- ------------------------
^ ^ ^
addr | p (example pointer pointing to example address 0x03)
Random number (Pointers are normally 8 Bytes but)
showing up in p[1] (But In this example I used single bytes)
So when you tell your system to get p[0] or *p (these would do the same thing)
it will go to the address (eg. 0x03) and get one byte (because its a char)
in this case 0xB3.
But when you try p[1] or *(p+1) That will go to the address (eg. 0x03) skip the first char and get the next one giving us 0x59 which would be there for some other variable.
Ok so we've got that out of the way so how do you get the nibbles?
A problem with getting the nibble is that you generally can't just have half a byte a put variable, theres no type that supports just 4 bits.
When you print with %x/%X it will only show the nibbles up to the last non-zero number eg. = 0x00230242 would only show 230242 but if you do something like
%2lX would show 2 full bytes (including the zeros)
%4lX would show 4 full bytes (including the zeros)
So it is pretty pointless trying to get individual nibbles but if you want way to do something like then do:
char addr = 0x3B;
char addr1 = ((addr >> 4) & 0x0F);
char addr2 = ((addr >> 0) & 0x0F);

Related

c bit manipulation (endianess)

Could someone explain me this code please ? I have received some byte code from an assembler and now I have to use it in my virtual machine. This code is used but I don't know how it works and what it is used for.
static int32_t bytecode_to_int32 (const uint8_t* bytes)
{
uint32_t result = (uint32_t)bytes[0] << 24 |
(uint32_t)bytes[1] << 16 |
(uint32_t)bytes[2] << 8 |
(uint32_t)bytes[3] << 0 ;
return (int32_t)result;
}
It builds up a 32 bit word from 4 bytes.
For example if the bytes are : 1st: 0x12 , 2nd: 0x34, 3rd: 0x56, 4th: 0x78
Then:
static int32_t bytecode_to_int32 (const uint8_t* bytes)
{
uint32_t result = (uint32_t)bytes[0] << 24 | // -> 0x12000000
(uint32_t)bytes[1] << 16 | // -> 0x00340000
(uint32_t)bytes[2] << 8 | // -> 0x00005600
(uint32_t)bytes[3] << 0 ; // -> 0x00000078
return (int32_t)result; // bitwise oring this result -> 0x12345678
}
This function attempts to combine the four bytes in a uint8_t[4] into a single uint32_t with big-endian byte order, cast the result into a signed int32_t, and return that.
So, if you pass a pointer to the array { 0xAA, 0xBB, 0xCC, 0xDD } to the function, it will combine them into a 32-bit integer with the most significant bytes of the integer coming from the lowest addresses in the array, giving you 0xAABBCCDD or -1430532899.
However, if the array pointed to by the argument bytes is not at least four bytes long, it has undefined behavior.

Printf in C prints ffffffe1 instead of e1

I am very much confused, I have a small program where I am printing the value at different address location.
int main ()
{
// unsigned int x = 0x15711056;
unsigned int x = 0x15b11056;
char *c = (char*) &x;
printf ("*c is: 0x%x\n", *c);
printf("size of %d\n", sizeof(x));
printf("Value at first address %x\n", *(c+0));
printf("Value at second address %x\n", *(c+1));
printf("Value at third address %x\n", *(c+2));
printf("Value at fourth address %x\n", *(c+3));
For the commented unsigned int x the printf values are as expected i.e.
printf("Value at first address %x\n", *(c+0)) = 56
printf("Value at second address %x\n", *(c+1))= 10
printf("Value at third address %x\n", *(c+2))= 71
printf("Value at fourth address %x\n", *(c+3))= 15
But for un-commented int x why I am getting below result for *(c+2) It should be b1 not ffffffb1. Please help me to understand this I am running this on an online IDE https://www.onlinegdb.com/online_c_compiler. My PC is i7 intel.
printf("Value at first address %x\n", *(c+0)) = 56
printf("Value at second address %x\n", *(c+1))= 10
printf("Value at third address %x\n", *(c+2))= ffffffb1
printf("Value at fourth address %x\n", *(c+3))= 15
The value is signed as 0xB1 is 10110001 in binary, you need to use an unsigned char pointer:
unsigned char *c = (unsigned char*) &x;
Your code would work for any bytes up to 0x7F.
c is a signed char, 0xB1 (which is signed) is 1011 0001, you see that
the most significant bit is 1, so it's a negative number.
When you pass *(c+2) to printf, it gets promoted to an int which is
signed. Sign extension fills the rest of the bits with the same value as the
most significant bit from your char, which is 1. At this point printf
gets 1111 1111 1111 1111 1111 1111 1011 0001.
%x in printf prints it as an unsigned int, thus it prints 0xFFFFFFB1.
You have to declare your pointer as an unsigned char.
unsigned char *c = (unsigned char*) &x;
unsigned int x = 0x15b11056; /*lets say starting address of x is 0x100 */
char *c = (char*) &x; /** c is char pointer i.e at a time it can fetch 1 byte and it points to 0x100 **/
x looks like as below
------------------------------------------------------
| 0001 0101 | 1011 0001 | 0001 0000 | 0101 0110 |
------------------------------------------------------
0x104 0x103 0x102 0x101 0x100
x
c
Next, when you are doing *(c+2)); Lets expand it
*(c+2)) = *(0x100 + 2*1) /** increment by 1 byte */
= *(0x102)
= 1011 0001 (in binary) Notice here that sign bit is 1
means sign bit is going to copy to remaining bytes
As you are printing in %x format which expects unsigned type but c is of signed byte,sign bit gets copied into remaining bytes.
for *(c+2) input will be looks like
0000 0000 | 0000 0000 | 0000 0000 | 1011 0001
|
sign bit is one so this bit will be copied into remaining bytes, resultant will look like below
1111 1111 | 1111 1111 | 1111 1111 | 1011 0001
f f f f f f b 1
I explained particular part which you had doubt, I hope it helps.

Pointers in C with typecasting

#include<stdio.h>
int main()
{
int a;
char *x;
x = (char *) &a;
a = 512;
x[0] = 1;
x[1] = 2;
printf("%d\n",a);
return 0;
}
I'm not able to grasp the fact that how the output is 513 or even Machine dependent ? I can sense that typecasting is playing a major role but what is happening behind the scenes, can someone help me visualise this problem ?
The int a is stored in memory as 4 bytes. The number 512 is represented on your machine as:
0 2 0 0
When you assign to x[0] and x[1], it changes this to:
1 2 0 0
which is the number 513.
This is machine-dependent, because the order of bytes in a multi-byte number is not specified by the C language.
For simplifying assume the following:
size of int is 4 (in bytes)
size of any pointer type is 8
size of char is 1 byte
in line 3 x is referencing a as a char, this means that x thinks that he is pointing to a char (he has no idea that a was actually a int.
line 4 is meant to confuse you. Don't.
line 5 - since x thinks he is pointing to a char x[0] = 1 changes just the first byte of a (because he thinks that he is a char)
line 6 - once again, x changed just the second byte of a.
note that the values put in lines 5 and 6 overide the value in line 4.
the value of a is now 0...0000 0010 0000 0001 (513).
Now when we print a as an int, all 4 bytes would be considered as expected.
Let me try to break this down for you in addition to the previous answers:
#include<stdio.h>
int main()
{
int a; //declares an integer called a
char *x; //declares a pointer to a character called x
x = (char *) &a; //points x to the first byte of a
a = 512; //writes 512 to the int variable
x[0] = 1; //writes 1 to the first byte
x[1] = 2; //writes 2 to the second byte
printf("%d\n",a); //prints the integer
return 0;
}
Note that I wrote first byte and second byte. Depending on the byte order of your platform and the size of an integer you might not get the same results.
Lets look at the memory for 32bit or 4 Bytes sized integers:
Little endian systems
first byte | second byte | third byte | forth byte
0x00 0x02 0x00 0x00
Now assigning 1 to the first byte and 2 to the second one leaves us with this:
first byte | second byte | third byte | forth byte
0x01 0x02 0x00 0x00
Notice that the first byte gets changed to 0x01 while the second was already 0x02.
This new number in memory is equivalent to 513 on little endian systems.
Big endian systems
Lets look at what would happen if you were trying this on a big endian platform:
first byte | second byte | third byte | forth byte
0x00 0x00 0x02 0x00
This time assigning 1 to the first byte and 2 to the second one leaves us with this:
first byte | second byte | third byte | forth byte
0x01 0x02 0x02 0x00
Which is equivalent to 16,908,800 as an integer.
I'm not able to grasp the fact that how the output is 513 or even Machine dependent
The output is implementation-defined. It depends on the order of bytes in CPU's interpretation of integers, commonly known as endianness.
I can sense that typecasting is playing a major role
The code reinterprets the value of a, which is an int, as an array of bytes. It uses two initial bytes, which is guaranteed to work, because an int is at least two bytes in size.
Can someone help me visualise this problem?
An int consists of multiple bytes. They can be addressed as one unit that represents an integer, but they can also be addressed as a collection of bytes. The value of an int depends on the number of bytes that you set, and on the order of these bytes in CPU's interpretation of integers.
It looks like your system stores the least significant byte at a lowest address, so the result of storing 1 and 2 at offsets zero and one produces this layout:
Byte 0 Byte 1 Byte 2 Byte 3
------ ------ ------ ------
1 2 0 0
Integer value can be computed as follows:
1 + 2*256 + 0*65536 + 0*16777216
By taking x, which is a char *, and pointing it to the address of a, which is an int, you can use x to modify the individual bytes that represent a.
The output you're seeing suggests that an int is stored in little-endian format, meaning the least significant byte comes first. This can change however if you run this code on a different system (ex. a Sun SPARC machine which is big-enidan).
You first set a to 512. In hex, that's 0x200. So the memory for a, assuming a 32 bit int in little endian format, is laid out as follows:
-----------------------------
| 0x00 | 0x02 | 0x00 | 0x00 |
-----------------------------
Next you set x[0] to 1, which updates the first byte in the representation of a (in this case leaving it unchanged):
-----------------------------
| 0x01 | 0x02 | 0x00 | 0x00 |
-----------------------------
Then you set x[1] to 2, which updates the second byte in the representation of a:
-----------------------------
| 0x01 | 0x02 | 0x00 | 0x00 |
-----------------------------
Now a has a value of 0x201, which in decimal is 513.

How the char array in union is working?

static int i = 2;
union U {
int a, b;
char c[3];
}u;
int main(){
u.b = 0x6;
for(;i; u.b++)
u.b = u.a << i--;
printf("%d %o %s", u.a, u.b, u.c);
return 0;
}
This code gives the output for the character array as 3. Now I know that this code poses several Undefined Behaviour specially when I am a storing into one variable and accessing of some other, but just for the purpose of experiment, can anybody explain to me why u.c has a value of 3.
Note: Internal memory structure would be better to understand this
After the for loop the union u contains the bits:
0x00000033
which split into chars are
0x33 0x00 0x00
so
c[0]=0x33
c[1]=0x00
c[2]=0x00
and 0x33 happens to be the ASCII code for the digit '3';
u.a, u.b and c's bytes all occupy the same memory. Since u.a. and u.b have the same type they are essentially the same variable. The loop
int i=2;
u.b = 6;
for(;i; u.b++)
u.b = u.a << i--;
can be written (just using u.b for clarity) as:
u.b = 6;
u.b = u.b << 2; // u.b is now 24 (one bit shift left is multiplying by 2)
u.b++; // u.b is now 25
u.b = u.b << 1; // u.b is now 50
u.b++; // u.b is now 51.
Now the memory layout of a 32 bit integer on a PC with low byte first is, byte wise, 51-00-00-00.
Interpreting these bytes as a string, as you told printf to do with the %s conversion, means that 51 is taken as an ascii value, denoting the letter 3. Fortunately the next byte is indeed 0, because the integer is small, so that the string is terminated. printf will print 3.
You can symply test it printing the hex code of a with:
printf("\n%X\n", u.a);
Output will be 0x33, that is ASCII 3
The for loop do
Start with b=0x06
Then left shift by 2 => b=0x18
Inc b => b= 0x19
Then left shift b by 1 => b=0x32
Inc b => b= 0x33
You define an union then a coincide with b.
First 3 byte of a and b are also accessible by c.
BTW printf output depends on endianess of data.
In your case, Little Endian, printf print ASCII 3, because of c is:
c[0]=0x33
c[1]=0x00
c[2]=0x00
In case of Big Endian printf print nothing, because of c is:
c[0]=0x00
c[1]=0x00
c[2]=0x00

Unsigned Char pointing to unsigned integer

I don't understand why the following code prints out 7 2 3 0 I expected it to print out 1 9 7 1. Can anyone explain why it is printing 7230?:
unsigned int e = 197127;
unsigned char *f = (char *) &e;
printf("%ld\n", sizeof(e));
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d\n", *f);
Computers work with binary, not decimal, so 197127 is stored as a binary number and not a series of single digits separately in decimal
19712710 = 0003020716 = 0011 0000 0010 0000 01112
Suppose your system uses little endian, 0x00030207 would be stored in memory as 0x07 0x02 0x03 0x00 which is printed out as (7 2 3 0) as expected when you print out each byte
Because with your method you print out the internal representation of the unsigned and not its decimal representation.
Integers or any other data are represented as bytes internally. unsigned char is just another term for "byte" in this context. If you would have represented your integer as decimal inside a string
char E[] = "197127";
and then done an anologous walk throught the bytes, you would have seen the representation of the characters as numbers.
Binary representation of "197127" is "00110000001000000111".
The bytes looks like "00000111" (is 7 decimal), "00000010" (is 2), "0011" (is 3). the rest is 0.
Why did you expect 1 9 7 1? The hex representation of 197127 is 0x00030207, so on a little-endian architecture, the first byte will be 0x07, the second 0x02, the third 0x03, and the fourth 0x00, which is exactly what you're getting.
The value of e as 197127 is not a string representation. It is stored as a 16/32 bit integer (depending on platform). So, in memory, e is allocated, say 4 bytes on the stack, and would be represented as 0x30207 (hex) at that memory location. In binary, it would look like 110000001000000111. Note that the "endian" would actually backwards. See this link account endianess. So, when you point f to &e, you are referencing the 1st byte of the numeric value, If you want to represent a number as a string, you should have
char *e = "197127"
This has to do with the way the integer is stored, more specifically byte ordering. Your system happens to have little-endian byte ordering, i.e. the first byte of a multi byte integer is least significant, while the last byte is most significant.
You can try this:
printf("%d\n", 7 + (2 << 8) + (3 << 16) + (0 << 24));
This will print 197127.
Read more about byte order endianness here.
The byte layout for the unsigned integer 197127 is [0x07, 0x02, 0x03, 0x00], and your code prints the four bytes.
If you want the decimal digits, then you need to break the number down into digits:
int digits[100];
int c = 0;
while(e > 0) { digits[c++] = e % 10; e /= 10; }
while(c > 0) { printf("%u\n", digits[--c]); }
You know the type of int often take place four bytes. That means 197127 is presented as 00000000 00000011 00000010 00000111 in memory. From the result, your memory's address are Little-Endian. Which means, the low-byte 0000111 is allocated at low address, then 00000010 and 00000011, finally 00000000. So when you output f first as int, through type cast you obtain a 7. By f++, f points to 00000010, the output is 2. The rest could be deduced by analogy.
The underlying representation of the number e is in binary and if we convert the value to hex we can see that the value would be(assuming 32 bit unsigned int):
0x00030207
so when you iterate over the contents you are reading byte by byte through the *unsigned char **. Each byte contains two 4 bit hex digits and the byte order endiannes of the number is little endian since the least significant byte(0x07) is first and so in memory the contents are like so:
0x07020300
^ ^ ^ ^- Fourth byte
| | |-Third byte
| |-Second byte
|-First byte
Note that sizeof returns size_t and the correct format specifier is %zu, otherwise you have undefined behavior.
You also need to fix this line:
unsigned char *f = (char *) &e;
to:
unsigned char *f = (unsigned char *) &e;
^^^^^^^^
Because e is an integer value (probably 4 bytes) and not a string (1 byte per character).
To have the result you expect, you should change the declaration and assignment of e for :
unsigned char *e = "197127";
unsigned char *f = e;
Or, convert the integer value to a string (using sprintf()) and have f point to that instead :
char s[1000];
sprintf(s,"%d",e);
unsigned char *f = s;
Or, use mathematical operation to get single digit from your integer and print those out.
Or, ...

Resources