I'm trying to convert 2 bytes array to an unsigned short.
this is the code for the conversion :
short bytesToShort(char* bytesArr)
{
short result =(short)((bytesArr[1] << 8)|bytesArr[0]);
return result;
}
I have an InputFile which stores bytes, and I read its bytes via loop (2 bytes each time) and store it in char N[] arr in this manner :
char N[3];
N[2]='\0';
while(fread(N,1,2,inputFile)==2)
when the (hex) value of N[0]=0 the computation is correct otherwise its wrong, for example :
0x62 (N[0]=0x0,N[1]=0x62) will return 98 (in short value), but 0x166 in hex (N[0]=0x6,N[1]=0x16) will return 5638 (in short value).
In the first place, it's generally best to use type unsigned char for the bytes of raw binary data, because that correctly expresses the semantics of what you're working with. Type char, although it can be, and too frequently is, used as a synonym for "byte", is better reserved for data that are actually character in nature.
In the event that you are furthermore performing arithmetic on byte values, you almost surely want unsigned char instead of char, because the signedness of char is implementation-defined. It does vary among implementations, and on many common implementations char is signed.
With that said, your main problem appears simple. You said
166 in hex (N[0]=6,N[1]=16) will return 5638 (in short value).
but 0x166 packed into a two-byte little-endian array would be (N[0]=0x66,N[1]=0x1). What you wrote would correspond to 0x1606, which indeed is the same as decimal 5638.
The problem is sign extension due to using char. You should use unsigned char instead:
#include <stdio.h>
short bytesToShort(unsigned char* bytesArr)
{
short result = (short)((bytesArr[1] << 8) | bytesArr[0]);
return result;
}
int main()
{
printf("%04x\n", bytesToShort("\x00\x11")); // expect 0x1100
printf("%04x\n", bytesToShort("\x55\x11")); // expect 0x1155
printf("%04x\n", bytesToShort("\xcc\xdd")); // expect 0xddcc
return 0;
}
Note: the problem in the code is not the one presented by the OP. The problem is returning the wrong result upon the input "\xcc\xdd". It will produce 0xffcc where it should be 0xddcc
Related
Will the accessibility of memory space get changed or just informing the compiler take the variable of mentioned type?
Example:
int main()
{
char a;
a = 123456789;
printf("ans is %d\n",(int)a);
}
Output:
overflow in implicit constant conversion a= 123456789.
ans is 21.
Here I know why it's causing overflow. But I want to know how memory is accessed when an overflow occurs.
This is kind of simple: Since char typically only holds one byte, only a single byte of 123456789 will be copied to a. Exactly how depends on if char is signed or unsigned (it's implementation-specific which one it is). For the exact details see e.g. this integer conversion reference.
What typically happens (I haven't seen any compiler do any different) is that the last byte of the value is copied, unmodified, into a.
For 123456789, if you view the hexadecimal representation of the value it will be 0x75bcd15. Here you can easily see that the last byte is 0x15 which is 21 in decimal.
What happens with the casting to int when you print the value is actually nothing that wouldn't happen anyway... When using variable-argument functions like printf values of a smaller type than int will be promoted to an int. Your printf call is exactly equal to
printf("ans is %d\n",a);
In my course for intro to operating systems, our task is to determine if a system is big or little endian. There's plenty of results I've found on how to do it, and I've done my best to reconstruct my own version of a code. I suspect it's not the best way of doing it, but it seems to work:
#include <stdio.h>
int main() {
int a = 0x1234;
unsigned char *start = (unsigned char*) &a;
int len = sizeof( int );
if( start[0] > start[ len - 1 ] ) {
//biggest in front (Little Endian)
printf("1");
} else if( start[0] < start[ len - 1 ] ) {
//smallest in front (Big Endian)
printf("0");
} else {
//unable to determine with set value
printf( "Please try a different integer (non-zero). " );
}
}
I've seen this line of code (or some version of) in almost all answers I've seen:
unsigned char *start = (unsigned char*) &a;
What is happening here? I understand casting in general, but what happens if you cast an int to a char pointer? I know:
unsigned int *p = &a;
assigns the memory address of a to p, and that can you affect the value of a through dereferencing p. But I'm totally lost with what's happening with the char and more importantly, not sure why my code works.
Thanks for helping me with my first SO post. :)
When you cast between pointers of different types, the result is generally implementation-defined (it depends on the system and the compiler). There are no guarantees that you can access the pointer or that it correctly aligned etc.
But for the special case when you cast to a pointer to character, the standard actually guarantees that you get a pointer to the lowest addressed byte of the object (C11 6.3.2.3 §7).
So the compiler will implement the code you have posted in such a way that you get a pointer to the least significant byte of the int. As we can tell from your code, that byte may contain different values depending on endianess.
If you have a 16-bit CPU, the char pointer will point at memory containing 0x12 in case of big endian, or 0x34 in case of little endian.
For a 32-bit CPU, the int would contain 0x00001234, so you would get 0x00 in case of big endian and 0x34 in case of little endian.
If you de reference an integer pointer you will get 4 bytes of data(depends on compiler,assuming gcc). But if you want only one byte then cast that pointer to a character pointer and de reference it. You will get one byte of data. Casting means you are saying to compiler that read so many bytes instead of original data type byte size.
Values stored in memory are a set of '1's and '0's which by themselves do not mean anything. Datatypes are used for recognizing and interpreting what the values mean. So lets say, at a particular memory location, the data stored is the following set of bits ad infinitum: 01001010 ..... By itself this data is meaningless.
A pointer (other than a void pointer) contains 2 pieces of information. It contains the starting position of a set of bytes, and the way in which the set of bits are to be interpreted. For details, you can see: http://en.wikipedia.org/wiki/C_data_types and references therein.
So if you have
a char *c,
an short int *i,
and a float *f
which look at the bits mentioned above, c, i, and f are the same, but *c takes the first 8 bits and interprets it in a certain way. So you can do things like printf('The character is %c', *c). On the other hand, *i takes the first 16 bits and interprets it in a certain way. In this case, it will be meaningful to say, printf('The character is %d', *i). Again, for *f, printf('The character is %f', *f) is meaningful.
The real differences come when you do math with these. For example,
c++ advances the pointer by 1 byte,
i++ advanced it by 4 bytes,
and f++ advances it by 8 bytes.
More importantly, for
(*c)++, (*i)++, and (*f)++ the algorithm used for doing the addition is totally different.
In your question, when you do a casting from one pointer to another, you already know that the algorithm you are going to use for manipulating the bits present at that location will be easier if you interpret those bits as an unsigned char rather than an unsigned int. The same operatord +, -, etc will act differently depending upon what datatype the operators are looking at. If you have worked in Physics problems wherein doing a coordinate transformation has made the solution very simple, then this is the closest analog to that operation. You are transforming one problem into another that is easier to solve.
I recently came across this question, where the OP was having issues printing the hexadecimal value of a variable. I believe the problem can be summed by the following code:
#include <stdio.h>
int main() {
char signedChar = 0xf0;
printf("Signed\n”);
printf(“Raw: %02X\n”, signedChar);
printf(“Masked: %02X\n”, signedChar &0xff);
printf(“Cast: %02X\n", (unsigned char)signedChar);
return 0;
}
This gives the following output:
Signed
Raw: FFFFFFF0
Masked: F0
Cast: F0
The format string used for each of the prints is %02X, which I’ve always interpreted as ‘print the supplied int as a hexadecimal value with at least two digits’.
The first case passes the signedCharacter as a parameter and prints out the wrong value (because the other three bytes of the int have all of their bits set).
The second case gets around this problem, by applying a bit mask (0xFF) against the value to remove all but the least significant byte, where the char is stored. Should this work? Surely: signedChar == signedChar & 0xFF?
The third case gets around the problem by casting the character to an unsigned char (which seems to clear the top three bytes?).
For each of the three cases above, can anybody tell me if the behavior defined? How/Where?
I don't think this behavior is completely defined by c standard. After all it depends on binary representation of signed values. I will just describe how it's likely to work.
printf(“Raw: %02X\n”, signedChar);
(char)0xf0 which can be written as (char)-16 is converted to (int)-16 its hex representation is 0xfffffff0.
printf(“Masked: %02X\n”, signedChar &0xff);
0xff is of type int so before calculating &, signedChar is converted to (int)-16.
((int)-16) & ((int)0xff) == (int)0x000000f0.
printf(“Cast: %02X\n", (unsigned char)signedChar);
(unsigned char)0xf0 which can be written as (unsigned char)240 is converted to (unsigned int)240 as hex it's 0x000000f0
I read that C not define if a char is signed or unsigned, and in GCC page this says that it can be signed on x86 and unsigned in PowerPPC and ARM.
Okey, I'm writing a program with GLIB that define char as gchar (not more than it, only a way for standardization).
My question is, what about UTF-8? It use more than an block of memory?
Say that I have a variable
unsigned char *string = "My string with UTF8 enconding ~> çã";
See, if I declare my variable as
unsigned
I will have only 127 values (so my program will to store more blocks of mem) or the UTF-8 change to negative too?
Sorry if I can't explain it correctly, but I think that i is a bit complex.
NOTE:
Thanks for all answer
I don't understand how it is interpreted normally.
I think that like ascii, if I have a signed and unsigned char on my program, the strings have diferently values, and it leads to confuse, imagine it in utf8 so.
I've had a couple requests to explain a comment I made.
The fact that a char type can default to either a signed or unsigned type can be significant when you're comparing characters and expect a certain ordering. In particular, UTF8 uses the high bit (assuming that char is an 8-bit type, which is true in the vast majority of platforms) to indicate that a character code point requires more than one byte to be represented.
A quick and dirty example of the problem:
#include <stdio.h>
int main( void)
{
signed char flag = 0xf0;
unsigned char uflag = 0xf0;
if (flag < (signed char) 'z') {
printf( "flag is smaller than 'z'\n");
}
else {
printf( "flag is larger than 'z'\n");
}
if (uflag < (unsigned char) 'z') {
printf( "uflag is smaller than 'z'\n");
}
else {
printf( "uflag is larger than 'z'\n");
}
return 0;
}
On most projects that I work, the unadorned char type is typically avoided in favor us using a typedef that explicitly specifies an unsigned char. Something like the uint8_t from stdint.h or
typedef unsigned char u8;
Generally dealing with an unsigned char type seems to work well and have few problems - the one area that I have seen occasional problems is when using something of that type to control a loop:
while (uchar_var-- >= 0) {
// infinite loop...
}
Two things:
Whether a char type is signed or unsigned won't affect your ability to translate UTF8-encoded-strings to and from whatever display string type you're using (WCHAR or whatnot). Don't worry about it, in other words: the UTF8 bytes are just bytes, and whatever you're using as an encoder/decoder will do the right thing.
Some of your confusion may be that you're trying to do this:
unsigned char *string = "This is a UTF8 string";
Don't do this-- you're mixing different concepts. A UTF-8 encoded string is just a sequence of bytes. C string literals (as above) were not really designed to represent this; they're designed to represent "ASCII-encoded" strings. Although for some cases (like mine here) they end up being the same thing, in your example in the question, they may not. And certainly in other cases they won't be. Load your Unicode strings from an external resource. In general I'd be wary of embedding non-ASCII characters in a .c source file; even if the compiler knows what to do with them, other software in your toolchain may not.
Using unsigned char has its pros and cons. The biggest benefits are that you don't get sign extension or other funny features such as signed overflow that would produce unexpected results from calculations. Unsigned char is also compatible with <cctype> macros/functions such as isalpha(ch) (all these require values in unsigned char range). On the other hand, all I/O functions require char*, requiring you to cast whenever you do I/O.
As for UTF-8, storing it in signed or unsigned arrays is fine but you have to be careful with those string literals as there is little guarantee about them being valid UTF-8. C++0x adds UTF-8 string literals to avoid possible issues and I would expect the next C standard to adopt those as well.
In general you should be fine, though, as long as you make sure that your source code files are always UTF-8 encoded.
signed / unsigned affect only arithmetic operations. if char is unsigned then higher values will be positive. in case of signed they will be negative. But range is same still.
Not really, unsigned / signed does not specify how many values a variable can hold. It specifies how they are interpreted.
So, an unsigned char has the same amount of values as a signed char, except that the one has negative numbers and the other doesn't. It is still 8 bits (if we assume that a char holds 8 bits, I'm not sure it does everywhere).
It makes no differences when using a char* as a string. The only time signed/unsigned would make a difference is if you would be interpreting it as a number, like for arithmetic or if you were to print it as an integer.
UTF-8 characters cannot be assumed to store in one byte. UTF-8 characters can be 1-4 bytes wide. So, a char, wchar_t, signed or unsigned would not be sufficient for assuming one unit can always store one UTF-8 character.
Most platforms (such as PHP, .NET, etc.) have you build strings normally (such as char[] in C) and you use a library to convert between encodings and parse characters out of the string.
As to you'r question:
think if I have a singed or unsigned ARRAY of chars can be it make my program run wrong? – drigoSkalWalker
Yes. Mine did. Heres a simple runnable excerpt from my app that totally comes out wrong if using ordinary signed chars.
Try running it after changing all chars to unsigned in parameters. Like this:
int is_valid(unsigned char c);
it should then work properly.
#include <stdio.h>
int is_valid(char c);
int main() {
char ch = 0xFE;
int ans = is_valid(ch);
printf("%d", ans);
}
int is_valid(char c) {
if((c == 0xFF) || (c == 0xFE)) {
printf("NOT valid\n");
return 0;
}
else {
printf("valid\n")
return 1;
}
}
What it does is validate if the char is a valid byte within utf-8.
0xFF and 0xFE are NOT valid bytes in utf-8.
imagine the problem if the function validates it as a valid byte?
what happens is this:
0xFE
=
11111110
=
254
If you save this in a ordinary char (that is signed) the leftmost bit, most significant bit, makes it negative. But what negative number is it?
It does this by flipping the bits and adding one bit.
11111110
00000001
00000001 + 00000001 =
00000010 = 2
and remember it made it negative, so it becomes -2
so (-2 == 0xFE) in the function ofcourse isnt true.
same goes for (-2 == 0xFF).
So a function that checks for invalid bytes ends up validating unvalid bytes as if they are ok :-o.
Two other reasons I can think of to stick to unsigned when dealing with utf-8 is:
If you might need some bitshifting to the right, there can be trouble because then you might end up adding 1's from the left if using signed chars.
utf-8 and unicode only uses positive numbers so... why dont you as well? keeping it simple :)
Compiling on linux using gcc.
I would like to convert this to hex. 10 which would be a.
I have managed to do this will the code below.
unsigned int index = 10;
char index_buff[5] = {0};
sprintf(index_buff, "0x%x", index);
data_t.un32Index = port_buff;
However, the problem is that I need to assign it to a structure
and the element I need to assign to is an unsigned int type.
This works however:
data_t.un32index = 0xa;
However, my sample code doesn't work as it thinks I am trying to convert
from an string to a unsigned int.
I have tried this, but this also failed
data_t.un32index = (unsigned int) *index_buff;
Many thanks for any advice,
Huh? The decimal/hex doesn't matter if you have the value in a variable. Just do
data_t.un32index = index;
Decimal and hex are just notation when printing numbers so humans can read them.
For a C (or C++, or Java, or any of a number of languages where these types are "primitives" with semantics closely matching those of machine registers) integer variable, the value it holds can never be said to "be in hex".
The value is held in binary (in all typical modern electronic computers, which are digital and binary in nature) in the memory or register backing the variable, and you can then generate various string representations, which is when you need to pick a base to use.
I agree with the previous answers, but I thought I'd share code that actually converts a hex string to an unsigned integer just to show how it's done:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *hex_value_string = "deadbeef";
unsigned int out;
sscanf(hex_value_string, "%x", &out);
printf("%o %o\n", out, 0xdeadbeef);
printf("%x %x\n", out, 0xdeadbeef);
return 0;
}
Gives this when executed:
emil#lanfear /home/emil/dev $ ./hex
33653337357 33653337357
deadbeef deadbeef
However, my sample code doesn't work as it thinks I am trying to convert from an string to a unsigned int.
This is because when you write the following:
data_t.un32index = index_buff;
you do have a type mismatch. You are trying to assign a character array index_buff to an unsigned int i.e. data_t.un32index.
You should be able to assign the index as suggested directly to data_t.un32index.