using strtol on bytes stored in char array - c

I'm trying to extract 2nd and 3rd byte from a char array and interpret it's value as an integer. Here in this case, want to extract 0x01 and 0x18 and interpret its value as 0x118 or 280 (decimal) using strtol. But the output act len returns 0.
int main() {
char str[]={0x82,0x01,0x18,0x7d};
char *len_f_str = malloc(10);
int i;
memset(len_f_str,'\0',sizeof(len_f_str));
strncpy(len_f_str,str+1,2);
printf("%x\n",str[1] & 0xff);
printf("%x\n",len_f_str[1] & 0xff);
printf("act len:%ld\n",strtol(len_f_str,NULL,16));
return 0;
}
Output:
bash-3.2$ ./a.out
1
18
act len:0
What am I missing here? Help appreciated.

strtol converts an ASCII representation of a string to a value, not the actual bits.
Try this:
short* myShort;
myShort = (short*) str[1];
long myLong = (long) myShort;

strtol expects the input to be a sequence of characters representing the number in a printable form. To represent 0x118 you would use
char *num = "118";
If you then pass num to strtol, and give it a radix of 16, you would get your 280 back.
If your number is represented as a sequence of bytes, you could use simple math to compute the result:
unsigned char str[]={0x82,0x01,0x18,0x7d};
unsigned int res = str[1] << 8 | str[2];

Related

Char automatically converts to int (I guess)

I have following code
char temp[] = { 0xAE, 0xFF };
printf("%X\n", temp[0]);
Why output is FFFFFFAE, not just AE?
I tried
printf("%X\n", 0b10101110);
And output is correct: AE.
Suggestions?
The answer you're getting, FFFFFFAE, is a result of the char data type being signed. If you check the value, you'll notice that it's equal to -82, where -82 + 256 = 174, or 0xAE in hexadecimal.
The reason you get the correct output when you print 0b10101110 or even 174 is because you're using the literal values directly, whereas in your example you're first putting the 0xAE value in a signed char where the value is then being sort of "reinterpreted modulo 128", if you wanna think of it that way.
So in other words:
0 = 0 = 0x00
127 = 127 = 0x7F
128 = -128 = 0xFFFFFF80
129 = -127 = 0xFFFFFF81
174 = -82 = 0xFFFFFFAE
255 = -1 = 0xFFFFFFFF
256 = 0 = 0x00
To fix this "problem", you could declare the same array you initially did, just make sure to use an unsigned char type array and your values should print as you expect.
#include <stdio.h>
#include <stdlib.h>
int main()
{
unsigned char temp[] = { 0xAE, 0xFF };
printf("%X\n", temp[0]);
printf("%d\n\n", temp[0]);
printf("%X\n", temp[1]);
printf("%d\n\n", temp[1]);
return EXIT_SUCCESS;
}
Output:
AE
174
FF
255
https://linux.die.net/man/3/printf
According to the man page, %x or %X accept an unsigned integer. Thus it will read 4 bytes from the stack.
In any case, under most architectures you can't pass a parameter that is less then a word (i.e. int or long) in size, and in your case it will be converted to int.
In the first case, you're passing a char, so it will be casted to int. Both are signed, so a signed cast is performed, thus you see preceding FFs.
In your second example, you're actually passing an int all the way, so no cast is performed.
If you'd try:
printf("%X\n", (char) 0b10101110);
You'd see that FFFFFFAE will be printed.
When you pass a smaller than int data type (as char is) to a variadic function (as printf(3) is) the parameter is converted to int in case the parameter is signed and to unsigned int in the case it is unsigned. What is being done and you observe is a sign extension, as the most significative bit of the char variable is active, it is replicated to the thre bytes needed to complete an int.
To solve this and to have the data in 8 bits, you have two possibilities:
Allow your signed char to convert to an int (with sign extension) then mask the bits 8 and above.
printf("%X\n", (int) my_char & 0xff);
Declare your variable as unsigned, so it is promoted to an unsigned int.
unsigned char my_char;
...
printf("%X\n", my_char);
This code causes undefined behaviour. The argument to %X must have type unsigned int, but you supply char.
Undefined behaviour means that anything can happen; including, but not limited to, extra F's appearing in the output.

8 Byte Number as Hex in C

I have given a number, for example n = 10, and I want to calculate its length in hex with big endian and save it in a 8 byte char pointer. In this example I would like to get the following string:
"\x00\x00\x00\x00\x00\x00\x00\x50".
How do I do that automatically in C with for example sprintf?
I am not even able to get "\x50" in a char pointer:
char tmp[1];
sprintf(tmp, "\x%x", 50); // version 1
sprintf(tmp, "\\x%x", 50); // version 2
Version 1 and 2 don't work.
I have given a number, for example n = 10, and I want to calculate its length in hex
Repeatedly divide by 16 to find the number of hexadecimal digits. A do ... while insures the result is 1 when n==0.
int hex_length = 0;
do {
hex_length++;
} while (number /= 16);
save it in a 8 byte char pointer.
C cannot force your system to use 8-byte pointer. So if you system uses 4 byte char pointer, we are out of luck. Let us assume OP's system uses 8-byte pointer. Yet integers may be assigned to pointers. This may or may not result in valid pointer.
assert(sizeof (char*) == 8);
char *char_pointer = n;
printf("%p\n", (void *) char_pointer);
In this example I would like to get the following string: "\x00\x00\x00\x00\x00\x00\x00\x50".
In C, a string includes the various characters up to an including a null character. "\x00\x00\x00\x00\x00\x00\x00\x50" is not a valid C string, yet is a valid string literal. Code cannot construct string literals at run time, that is a part of source code. Further the relationship between n==10 and "\x00...\x00\x50" is unclear. Instead perhaps the goal is to store n into a 8-byte array (big endian).
char buf[8];
for (int i=8; i>=0; i--) {
buf[i] = (char) n;
n /= 256;
}
OP's code certainly will fail as it attempts to store a string which is too small. Further "\x%x" is not valid code as \x begins an invalid escape sequence.
char tmp[1];
sprintf(tmp, "\x%x", 50); // version 1
Just do:
int i;
...
int length = round(ceil(log(i) / log(16)));
This will give you (in length) the number of hexadecimal digits needed to represent i (without 0x of course).
log(i) / log(base) is the log-base of i. The log16 of i gives you the exponent.
To make clear what we're doing here: When rising 16 to the power of the found exponent, we get back i: 16^log16(i) = i.
By rounding up this exponent using ceil(), you get the number of digits.

Unsigned Char pointing to unsigned integer

I don't understand why the following code prints out 7 2 3 0 I expected it to print out 1 9 7 1. Can anyone explain why it is printing 7230?:
unsigned int e = 197127;
unsigned char *f = (char *) &e;
printf("%ld\n", sizeof(e));
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d\n", *f);
Computers work with binary, not decimal, so 197127 is stored as a binary number and not a series of single digits separately in decimal
19712710 = 0003020716 = 0011 0000 0010 0000 01112
Suppose your system uses little endian, 0x00030207 would be stored in memory as 0x07 0x02 0x03 0x00 which is printed out as (7 2 3 0) as expected when you print out each byte
Because with your method you print out the internal representation of the unsigned and not its decimal representation.
Integers or any other data are represented as bytes internally. unsigned char is just another term for "byte" in this context. If you would have represented your integer as decimal inside a string
char E[] = "197127";
and then done an anologous walk throught the bytes, you would have seen the representation of the characters as numbers.
Binary representation of "197127" is "00110000001000000111".
The bytes looks like "00000111" (is 7 decimal), "00000010" (is 2), "0011" (is 3). the rest is 0.
Why did you expect 1 9 7 1? The hex representation of 197127 is 0x00030207, so on a little-endian architecture, the first byte will be 0x07, the second 0x02, the third 0x03, and the fourth 0x00, which is exactly what you're getting.
The value of e as 197127 is not a string representation. It is stored as a 16/32 bit integer (depending on platform). So, in memory, e is allocated, say 4 bytes on the stack, and would be represented as 0x30207 (hex) at that memory location. In binary, it would look like 110000001000000111. Note that the "endian" would actually backwards. See this link account endianess. So, when you point f to &e, you are referencing the 1st byte of the numeric value, If you want to represent a number as a string, you should have
char *e = "197127"
This has to do with the way the integer is stored, more specifically byte ordering. Your system happens to have little-endian byte ordering, i.e. the first byte of a multi byte integer is least significant, while the last byte is most significant.
You can try this:
printf("%d\n", 7 + (2 << 8) + (3 << 16) + (0 << 24));
This will print 197127.
Read more about byte order endianness here.
The byte layout for the unsigned integer 197127 is [0x07, 0x02, 0x03, 0x00], and your code prints the four bytes.
If you want the decimal digits, then you need to break the number down into digits:
int digits[100];
int c = 0;
while(e > 0) { digits[c++] = e % 10; e /= 10; }
while(c > 0) { printf("%u\n", digits[--c]); }
You know the type of int often take place four bytes. That means 197127 is presented as 00000000 00000011 00000010 00000111 in memory. From the result, your memory's address are Little-Endian. Which means, the low-byte 0000111 is allocated at low address, then 00000010 and 00000011, finally 00000000. So when you output f first as int, through type cast you obtain a 7. By f++, f points to 00000010, the output is 2. The rest could be deduced by analogy.
The underlying representation of the number e is in binary and if we convert the value to hex we can see that the value would be(assuming 32 bit unsigned int):
0x00030207
so when you iterate over the contents you are reading byte by byte through the *unsigned char **. Each byte contains two 4 bit hex digits and the byte order endiannes of the number is little endian since the least significant byte(0x07) is first and so in memory the contents are like so:
0x07020300
^ ^ ^ ^- Fourth byte
| | |-Third byte
| |-Second byte
|-First byte
Note that sizeof returns size_t and the correct format specifier is %zu, otherwise you have undefined behavior.
You also need to fix this line:
unsigned char *f = (char *) &e;
to:
unsigned char *f = (unsigned char *) &e;
^^^^^^^^
Because e is an integer value (probably 4 bytes) and not a string (1 byte per character).
To have the result you expect, you should change the declaration and assignment of e for :
unsigned char *e = "197127";
unsigned char *f = e;
Or, convert the integer value to a string (using sprintf()) and have f point to that instead :
char s[1000];
sprintf(s,"%d",e);
unsigned char *f = s;
Or, use mathematical operation to get single digit from your integer and print those out.
Or, ...

C - unsigned int to unsigned char array conversion

I have an unsigned int number (2 byte) and I want to convert it to unsigned char type. From my search, I find that most people recommend to do the following:
unsigned int x;
...
unsigned char ch = (unsigned char)x;
Is the right approach? I ask because unsigned char is 1 byte and we casted from 2 byte data to 1 byte.
To prevent any data loss, I want to create an array of unsigned char[] and save the individual bytes into the array. I am stuck at the following:
unsigned char ch[2];
unsigned int num = 272;
for(i=0; i<2; i++){
// how should the individual bytes from num be saved in ch[0] and ch[1] ??
}
Also, how would we convert the unsigned char[2] back to unsigned int.
Thanks a lot.
You can use memcpy in that case:
memcpy(ch, (char*)&num, 2); /* although sizeof(int) would be better */
Also, how would be convert the unsigned char[2] back to unsigned int.
The same way, just reverse the arguments of memcpy.
How about:
ch[0] = num & 0xFF;
ch[1] = (num >> 8) & 0xFF;
The converse operation is left as an exercise.
How about using a union?
union {
unsigned int num;
unsigned char ch[2];
} theValue;
theValue.num = 272;
printf("The two bytes: %d and %d\n", theValue.ch[0], theValue.ch[1]);
It really depends on your goal: why do you want to convert this to an unsigned char? Depending on the answer to that there are a few different ways to do this:
Truncate: This is what was recomended. If you are just trying to squeeze data into a function which requires an unsigned char, simply cast uchar ch = (uchar)x (but, of course, beware of what happens if your int is too big).
Specific endian: Use this when your destination requires a specific format. Usually networking code likes everything converted to big endian arrays of chars:
int n = sizeof x;
for(int y=0; n-->0; y++)
ch[y] = (x>>(n*8))&0xff;
will does that.
Machine endian. Use this when there is no endianness requirement, and the data will only occur on one machine. The order of the array will change across different architectures. People usually take care of this with unions:
union {int x; char ch[sizeof (int)];} u;
u.x = 0xf00
//use u.ch
with memcpy:
uchar ch[sizeof(int)];
memcpy(&ch, &x, sizeof x);
or with the ever-dangerous simple casting (which is undefined behavior, and crashes on numerous systems):
char *ch = (unsigned char *)&x;
Of course, array of chars large enough to contain a larger value has to be exactly as big as this value itself.
So you can simply pretend that this larger value already is an array of chars:
unsigned int x = 12345678;//well, it should be just 1234.
unsigned char* pChars;
pChars = (unsigned char*) &x;
pChars[0];//one byte is here
pChars[1];//another byte here
(Once you understand what's going on, it can be done without any variables, all just casting)
You just need to extract those bytes using bitwise & operator. OxFF is a hexadecimal mask to extract one byte. Please look at various bit operations here - http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/
An example program is as follows:
#include <stdio.h>
int main()
{
unsigned int i = 0x1122;
unsigned char c[2];
c[0] = i & 0xFF;
c[1] = (i>>8) & 0xFF;
printf("c[0] = %x \n", c[0]);
printf("c[1] = %x \n", c[1]);
printf("i = %x \n", i);
return 0;
}
Output:
$ gcc 1.c
$ ./a.out
c[0] = 22
c[1] = 11
i = 1122
$
Endorsing #abelenky suggestion, using an union would be a more fail proof way of doing this.
union unsigned_number {
unsigned int value; // An int is 4 bytes long
unsigned char index[4]; // A char is 1 byte long
};
The characteristics of this type is that the compiler will allocate memory only for the biggest member of our data structure unsigned_number, which in this case is going to be 4 bytes - since both members (value and index) have the same size. Had you defined it as a struct instead, we would have 8 bytes allocated on memory, since the compiler does its allocation for all the members of a struct.
Additionally, and here is where your problem is solved, the members of an union data structure all share the same memory location, which means they all refer to same data - think of that like a hard link on GNU/Linux systems.
So we would have:
union unsigned_number my_number;
// Assigning decimal value 202050300 to my_number
// which is represented as 0xC0B0AFC in hex format
my_number.value = 0xC0B0AFC; // Representation: Binary - Decimal
// Byte 3: 00001100 - 12
// Byte 2: 00001011 - 11
// Byte 1: 00001010 - 10
// Byte 0: 11111100 - 252
// Printing out my_number one byte at time
for (int i = 0; i < (sizeof(my_number.value)); i++)
{
printf("index[%d]: %u, 0x%x\n", \
i, my_number.index[i], my_number.index[i]);
}
// Printing out my_number as an unsigned integer
printf("my_number.value: %u, 0x%x", my_number.value, my_number.value);
And the output is going to be:
index[0]: 252, 0xfc
index[1]: 10, 0xa
index[2]: 11, 0xb
index[3]: 12, 0xc
my_number.value: 202050300, 0xc0b0afc
And as for your final question, we wouldn't have to convert from unsigned char back to unsigned int since the values are already there. You just have to choose by which way you want to access it
Note 1: I am using an integer of 4 bytes in order to ease the understanding of the concept. For the problem you presented you must use:
union unsigned_number {
unsigned short int value; // A short int is 2 bytes long
unsigned char index[2]; // A char is 1 byte long
};
Note 2: I have assigned byte 0 to 252 in order to point out the unsigned characteristic of our index field. Was it declared as a signed char, we would have index[0]: -4, 0xfc as output.

Why does C print my hex values incorrectly?

So I'm a bit of a newbie to C and I am curious to figure out why I am getting this unusual behavior.
I am reading a file 16 bits at a time and just printing them out as follows.
#include <stdio.h>
#define endian(hex) (((hex & 0x00ff) << 8) + ((hex & 0xff00) >> 8))
int main(int argc, char *argv[])
{
const int SIZE = 2;
const int NMEMB = 1;
FILE *ifp; //input file pointe
FILE *ofp; // output file pointer
int i;
short hex;
for (i = 2; i < argc; i++)
{
// Reads the header and stores the bits
ifp = fopen(argv[i], "r");
if (!ifp) return 1;
while (fread(&hex, SIZE, NMEMB, ifp))
{
printf("\n%x", hex);
printf("\n%x", endian(hex)); // this prints what I expect
printf("\n%x", hex);
hex = endian(hex);
printf("\n%x", hex);
}
}
}
The results look something like this:
ffffdeca
cade // expected
ffffdeca
ffffcade
0
0 // expected
0
0
600
6 // expected
600
6
Can anyone explain to me why the last line in each block doesn't print the same value as the second?
The placeholder %x in the format string interprets the corresponding parameter as unsigned int.
To print the parameter as short, add a length modifier h to the placeholder:
printf("%hx", hex);
http://en.wikipedia.org/wiki/Printf_format_string#Format_placeholders
This is due to integer type-promotion.
Your shorts are being implicitly promoted to int. (which is 32-bits here) So these are sign-extension promotions in this case.
Therefore, your printf() is printing out the hexadecimal digits of the full 32-bit int.
When your short value is negative, the sign-extension will fill the top 16 bits with ones, thus you get ffffcade rather than cade.
The reason why this line:
printf("\n%x", endian(hex));
seems to work is because your macro is implicitly getting rid of the upper 16-bits.
You have implicitly declared hex as a signed value (to make it unsigned write unsigned short hex) so that any value over 0x8FFF is considered to be negative. When printf displays it as a 32-bit int value it is sign-extended with ones, causing the leading Fs. When you print the return value of endian before truncating it by assigning it to hex the full 32 bits are available and printed correctly.

Resources