I would like to convert int to byte in C.
How could i get the value?
in Java
int num = 167;
byte b = num.toByte(); // -89
in C
int num = 167;
???
There is no such type as Byte in native C. Although if you don't want to import new libs, you can create one like this :
typedef unsigned char Byte
And then create any variable you'd like with it :
int bar = 15;
Byte foo = (Byte)bar
You can simply cast to a byte:
unsigned char b=(unsigned char)num;
Note that if num is more than 255 or less than 0 C won't crash and simply give the wrong result.
In computer science, the term byte is well-defined as an 8 bit chunk of raw data. Apparently Java uses a different definition than computer science...
-89 is not the value 167 "converted to a byte". 167 already fits in a byte, so no conversion is necessary.
-89 is the value 167 converted to signed 2's complement with 8 bits representation.
The most correct type to use for signed 2's complement 8 bit integers in C is int8_t from stdint.h.
Converting from int to int8_t is done implicitly in C upon assignment. There is no need for a cast.
int num = 167;
int8_t b = num;
byte is a java signed integer type with a range of -128 to 127.
The corresponding type in C is int8_t defined in <stdint.h> for architectures with 8-bit bytes. It is an alias for signed char.
You can write:
#include <stdint.h>
void f() {
int num = 167;
int8_t b = num; // or signed char b = num;
...
If your compiler emits a warning about the implicit conversion to a smaller type, you can add an explicit cast:
int8_t b = (int8_t)num; // or signed char b = (signed char)num;
Note however that it is much more common to think of 8-bit bytes as unsigned quantities in the range 0 to 255, for which one would use type uint8_t or unsigned char. The reason java byte is a signed type might be that there is no unsigned type in this language, but it is quite confusing for non-native readers.
byte can also be defined as a typedef:
typedef unsigned char byte; // 0-255;
or
typedef signed char byte; // -128-127;
Do not use type char because it is implementation defined whether this type is signed or unsigned by default. Reserve type char for the characters in C strings, although many functions actually consider these to be unsigned: strcmp(), functions from <ctype.h>...
Related
If I understand well, int_fast_n_t types are guaranteed to be at least n bits long. Depending on the compiler and the architecture of the computer these types can also be defined on more than n bits. For instance, a int_fast_8_t could be interpreted as a 32 bits int.
Is there some kind of mechanism which enforces that the value of an int_fast_n_t never overflow even if the true type is defined on more than n bits?
Consider the following code for example:
int main(){
int_fast8_t a = 64;
a *= 2; // -128
return 0;
}
I do not want a to be greater than 127. If a is interpreted as a "regular" int (32 bits), is it possible that a exceed 127 and be not equal to -128?
Thanks for your answers.
int_fast8_t a = 64;
a *= 2;
If a is interpreted as a "regular" int (32 bits), is it possible that a exceed 127 and be not equal to -128?
Yes. It very likely a * 2 will save in a as 128. I would expect this on all processors unless the processor was an 8-bit one.
Is there some kind of mechanism which enforces that the value of an int_fast_n_t never overflow ?
No. Signed integer overflow is still possible as well as values outside the [-128...127] range.
I do not want a to be greater than 127
Use int8_t. The value save will never exceed 127, yet code still has implementation defined behavior in setting a 128 to an int8_t. This often results in -128 (values wrap mod 256), yet other values are possible (this is uncommon).
int8_t a = 64;
a *= 2;
If assignment to int8_t is not available or has unexpected implementation defined behavior, code could force the wrapping itself:
int_fast8_t a = foo(); // a takes on some value
a %= 256;
if (a < -128) a += 256;
else if (a > 127) a -= 256;
It is absolutely possible for the result to exceed 127. int_fast8_t (and uint_fast8_t and all the rest) set an explicit minimum size for the value, but it could be larger, and the compiler will not prevent it from exceeding the stated 8 bit bounds (it behaves exactly like the larger type it represents, the "8ness" of it isn't relevant at runtime), only guarantee it can definitely represent all values in said 8 bit range.
If you need it to explicitly truncate/wrap to 8 bit values, either use (or cast to) int8_t to restrict the representable range (though overflow wouldn't be defined), or explicitly use masks to perform the same work yourself when needed.
Nope. All the fast types really are are typedefs. For example, stdint.h on my machine includes
/* Fast types. */
/* Signed. */
typedef signed char int_fast8_t;
#if __WORDSIZE == 64
typedef long int int_fast16_t;
typedef long int int_fast32_t;
typedef long int int_fast64_t;
#else
typedef int int_fast16_t;
typedef int int_fast32_t;
__extension__
typedef long long int int_fast64_t;
#endif
/* Unsigned. */
typedef unsigned char uint_fast8_t;
#if __WORDSIZE == 64
typedef unsigned long int uint_fast16_t;
typedef unsigned long int uint_fast32_t;
typedef unsigned long int uint_fast64_t;
#else
typedef unsigned int uint_fast16_t;
typedef unsigned int uint_fast32_t;
__extension__
typedef unsigned long long int uint_fast64_t;
#endif
The closest you can come without a significant performance penalty is probably casting the result to an 8-bit type.
Just use unsigned char if you want to manipulate 8 bits (unsigned char is one byte long) you will work on 0 to 0xFF (255) unsigned range
From the C(99) standard:
The typedef name intN_t designates a signed integer type with width N
, no padding bits, and a two’s complement representation. Thus, int8_t
denotes a signed integer type with a width of exactly 8 bits.
So use int8_t to guarantee 8 bit int.
A compliant C99/C11 compiler on a POSIX platform must have int8_t.
When coding in C, I have accidently found that as for non-Ascii characters, after they are converted from char (1 byte) to int (4 bytes), the extra bits (3 bytes) are supplemented by 1 rather than 0. (As for Ascii characters, the extra bits are supplemented by 0.) For example:
char c[] = "ā";
int i = c[0];
printf("%x\n", i);
And the result is ffffffc4, rather than c4 itself. (The UTF-8 code for ā is \xc4\x81.)
Another related issue is that when performing right shift operations >> on a non-Ascii character, the extra bits on the left end are also supplemented by 1 rather than 0, even though the char variable is explicitly converted to unsigned int (for as for signed int, the extra bits are supplemented by 1 in my OS). For example:
char c[] = "ā";
unsigned int u_c;
int i = c[0];
unsigned int u_i = c[0];
c[0] = (unsigned int)c[0] >> 1;
u_c = (unsigned int)c[0] >> 1;
i = i >> 1;
u_i = u_i >> 1;
printf("c=%x\n", (unsigned int)c[0]); // result: ffffffe2. The same with the signed int i.
printf("u_c=%x\n", u_c); // result: 7fffffe2.
printf("i=%x\n", i); // result: ffffffe2.
printf("u_i=%x\n", u_i); // result: 7fffffe2.
Now I am confused with these results... Are they concerned with the data structures of char, int and unsigned int, or related to my operating system (ubuntu 14.04), or related to the ANSI C requirements? I have tried to compile this program with both gcc(4.8.4) and clang(3.4), but there is no difference.
Thank you so much!
It is implementation-defined whether char is signed or unsigned. On x86 computers, char is customarily a signed integer type; and on ARM it is customarily an unsigned integer type.
A signed integer will be sign-extended when converted to a larger signed type;
a signed integer converted to unsigned integer will use the modulo arithmetic to wrap the signed value into the range of the unsigned type as if by repeatedly adding or subtracting the maximum value of the unsigned type + 1.
The solution is to use/cast to unsigned char if you want the value to be portably zero-extended, or for storing small integers in range 0..255.
Likewise, if you want to store signed integers in range -127..127/128, use signed char.
Use char if the signedness doesn't matter - the implementation will probably have chosen the type that is the most efficient for the platform.
Likewise, for the assignment
unsigned int u_c; u_c = (uint8_t)c[0];,
Since -0x3c or -60 is not in the range of uint16_t, then the actual value is the value (mod UINT16_MAX + 1) that falls in the range of uint16_t; iow, we add or subtract UINT16_MAX + 1 (notice that the integer promotions could trick here so you might need casts if in C code) until the value is in the range. UINT16_MAX is naturally always 0xFFFFF; add 1 to it to get 0x10000. 0x10000 - 0x3C is 0xFFC4 that you saw. And then the uint16_t value is zero-extended to the uint32_t value.
Had you run this on a platform where char is unsigned, the result would have been 0xC4!
BTW in i = i >> 1;, i is a signed integer with a negative value; C11 says that the value is implementation-defined, so the actual behaviour can change from compiler to compiler. The GCC manuals state that
Signed >> acts on negative numbers by sign extension.
However a strictly-conforming program should not rely on this.
Edit: I updated the example to be C. I am concerned specifically with C and not C++ (sorry for the confusion, see situation below).
I am looking for a safe way to convert a signed integer to an unsigned integer while always maintaining the exact same bit pattern between conversions. As I understand it, simply casting has undefined or implementation dependent behavior so it is not safe to rely on (case A below). But what about bit-wise operators like OR (case B below)? Can bit-wise OR be used to safely convert signed to unsigned? What about the reverse?
Example:
#include <stdio.h>
int main() {
// NOTE: assuming 32bit ints
// example bit pattern: 11111111110001110001001111011010
// signed int value: -3730470
// unsigned int value: 4291236826
// example 1
// signed -> unsigned
int s1 = -3730470;
unsigned int u1a = (unsigned int)s1;
unsigned int u1b = (unsigned int)0 | s1;
printf("%u\n%u\n", u1a, u1b);
// example 2
// unsigned -> signed
unsigned int u2 = 4291236826;
int s2a = (int)u2;
int s2b = (int)0 | u2;
printf("%i\n%i\n", s2a, s2b);
}
Situation: I am writing a PostgreSQL C-Language function/extension to add popcount functionality (my first attempt code here). PostgreSQL does not support unsigned types (ref). All the efficient methods of calculating popcount I found require unsigned data types to work correctly. Therefore, I must be able to convert the signed data types to an unsigned data type without changing the bit pattern.
Off topic: I do realize that an alternate solution would be to use PostgreSQL bit string bit and varbit data types instead of the integer data types, but for my purposes the integer data types are much easier to use and manage.
a safe way to convert a signed integer to an unsigned integer while always maintaining the exact same bit pattern between conversions
A union will work as below even if the int is a rare non-2's complement. Only on very expectational platforms (ticking away in a silicon graveyard) where INT_MAX == UINT_MAX will this be a problem.
union {
int i;
unsigned u;
} x = { some_int };
printf("%d\n", some_int);
printf("%u\n", x.u);
Yet if one can limit oneself to common 2's complement int, the below is sufficient.
unsigned u = (unsigned) some_int;
But what about bit-wise operators like OR (case B below)?
Can bit-wise OR be used to safely convert signed to unsigned?
The following | is like a hidden cast due to integer promotions:
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. C11dr §6.3.1.1 3
int s1 = -3730470;
unsigned int u1b = (unsigned int)0 | s1;
// just like
= (unsigned int)0 | (unsigned int)s1;
= (unsigned int)s1;
What about the reverse?
Converting a unsigned int to a signed int is well defined if the value is representable in both [0...INT_MAX]. Converting an out-of-int-range unsigned to int is ...
either the result is implementation-defined or an implementation-defined signal is raised. §6.3.1.3 3
Best to use unsigned types for bit manipulations.
The below code may often work as hoped, but should not be used for robust coding.
// NOTE: assuming 32bit ints, etc.
unsigned int u2 = 4291236826;
int s2a = (int)u2; // avoid this
Alternative
int s2a;
if (u2 > INT_MAX) {
// Handle with some other code
} else {
s2a = (int) u2; // OK
}
BTW: better to append u to unsigned constants like 4291236826 to convey to the compiler that indeed an unsigned constant is intended and not a long long like 4291236826.
unsigned int u2 = 4291236826u;
What about ...
int s1 = -3730470;
unsigned int u1 = *(unsigned int*)&s1;
unsigned int u2 = 4291236826;
int s2a = *(int*)&u2;
My sourcecode:
#include <stdio.h>
int main()
{
char myArray[150];
int n = sizeof(myArray);
for(int i = 0; i < n; i++)
{
myArray[i] = i + 1;
printf("%d\n", myArray[i]);
}
return 0;
}
I'm using Ubuntu 14 and gcc to compile it, what it prints out is:
1
2
3
...
125
126
127
-128
-127
-126
-125
...
Why doesn't it just count up to 150?
int value of a char can range from 0 to 255 or -127 to 127, depending on implementation.
Therefore once the value reaches 127 in your case, it overflows and you get negative value as output.
The signedness of a plain char is implementation defined.
In your case, a char is a signed char, which can hold the value of a range to -128 to +127.
As you're incrementing the value of i beyond the limit signed char can hold and trying to assign the same to myArray[i] you're facing an implementation-defined behaviour.
To quote C11, chapter §6.3.1.4,
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Because a char is a SIGNED BYTE. That means it's value range is -128 -> 127.
EDIT Due to all the below comment suggesting this is wrong / not the issue / signdness / what not...
Running this code:
char a, b;
unsigned char c, d;
int si, ui, t;
t = 200;
a = b = t;
c = d = t;
si = a + b;
ui = c + d;
printf("Signed:%d | Unsigned:%d", si, ui);
Prints: Signed:-112 | Unsigned:400
Try yourself
The reason is the same. a & b are signed chars (signed variables of size byte - 8bits). c & d are unsigned. Assigning 200 to the signed variables overflows and they get the value -56. In memory, a, b,c&d` all hold the same value, but when used their type "signdness" dictates how the value is used, and in this case it makes a big difference.
Note about standard
It has been noted (in the comments to this answer, as well as other answers) that the standard doesn't mandate that char is signed. That is true. However, in the case presented by OP, as well the code above, char IS signed.
It seems that your compiler by default considers type char like type signed char. In this case CHAR_MIN is equal to SCHAR_MIN and in turn equal to -128 while CHAR_MAX is equal to SCHAR_MAX and in turn equal to 127 (See header <limits.h>)
According to the C Standard (6.2.5 Types)
15 The three types char, signed char, and unsigned char are
collectively called the character types. The implementation shall
define char to have the same range, representation, and behavior as
either signed char or unsigned char
For signed types one bit is used as the sign bit. So for the type signed char the maximum value corresponds to the following representation in the hexadecimal notation
0x7F
and equal to 127. The most significant bit is the signed bit and is equal to 0.
For negative values the signed bit is set to 1 and for example -128 is represented like
0x80
When in your program the value stored in char reaches its positive maximum 0x7Fand was increased it becomes equal to 0x80 that in the decimal notation is equal to -128.
You should explicitly use type unsigned char instead of the char if you want that the result of the program execution did not depend on the compiler settings.
Or in the printf statement you could explicitly cast type char to type unsigned char. For example
printf("%d\n", ( unsigned char )myArray[i]);
Or to compare results you could write in the loop
printf("%d %d\n", myArray[i], ( unsigned char )myArray[i]);
If I do this in both clang and Visual Studio:
unsigned char *a = 0;
char * b = 0;
char x = '3';
a = & x;
b = (unsigned char*) a;
I get the warning that I am trying to convert between signed and unsigned character pointer but the code sure works. Though compiler is saying it for a reason. Can you point out a situation where this can turn into a problem?
To make it very simple because char represents:
A single character (char, it doesn't matter if signed or not). When you assign a character like 'A' what you're doing is to write A ASCII code (65) in that memory location.
A string (when used as array or pointer to a char buffer).
An eight bit number (with or without sign).
Then when you convert a signed byte like -1 to unsigned byte you'll loose information (at least sign but probably number too), that's why you get a warning:
signed char a = -1;
unsigned char b = (unsigned char)a;
if ((int)b == -1)
; // No! Now b is 255!
Value may not be 255 but 1 if your system doesn't represent negative numbers with 2's complement, in that example it doesn't really matter (and I never worked with any system like that but they exist) because the concept is a signed/unsigned conversion may discard information. It doesn't matter if this happens because of an explicit cast or a cast through pointers: bits will represent something else (and result will change according to implementation, environment and actual value).
Note that for C standard char, signed char and unsigned char are formally distinct types. You won't care (and VS will default char to signed or unsigned according to a compiler option but this isn't portable) and you may need casting.
Your code is correct (any type can be aliased by unsigned char). Also, on 2's complement systems, this alias is the same as the result of a value conversion.
The reverse operation; aliasing unsigned char by char is only a problem on esoteric systems that have trap representations for plain char.
I don't know of any such systems ever existing, although the C standard provides for their existence. Unfortunately a cast is required because of this possibility, which is more annoying than useful IMHO.
The aliasing of unsigned char by char is the same as the value conversion on every modern system that I know of (technically implementation-defined, but everyone implements it that the value conversion retains the same representation).
NB. definition of terms, taking for example unsigned char x = 250;:
alias char y = *(char *)&x;
conversion char y = x;
The char type can either be signed or unsigned depending on the platform. The code that you write with casting a char type to either unsigned or signed char might work fine within one platform, but not if the data is transferred across operating systems, ETC. See this URL:
http://www.trilithium.com/johan/2005/01/char-types/
Because you can lose some values - look at this:
unsigned char *a = 0;
char b = -3;
a = &b;
printf("%d", *a);
Result: 253
Let me explain this. Just look at ranges:
unsigned char: from 0 to 255
signed char: from -128 to 127
Edited: sorry for mistake, too hot today ;)