I have a simple code
char t = (char)(3000);
Then value of t is -72. The hex value of 3000 is 0xBB8. I couldn't understand why the value of t is -72.
Thanks for your answers.
I don't know about Mac. So my result is -72. As I know, MAC is using Big Endian, so does it affect the result? I dont have any MAC computer to test so I want to know from MAC people.
The hex value of 3000 is 0xBB8.
And so the hex value of the char (which, by the way, appears to be signed on your compiler) is 0xB8.
If it were unsigned, 0xB8 would be 184. But since it's signed, its actual value is 256 less, i.e. -72.
If you want to know why this is, read about two's complement notation.
A char is 8 bits (which can only represent a 0-255 range). Trying to cast 3000 to a char is... impossible impossible, at least for what you are intending.
This is happening because 3000 is too big a value and causes an overflow. Char is generally from -128 to 127 signed, or 0 to 255 unsigned, but it can change depending upon the implementation.
char is an integral type with certain range of representable values. int is also an integral type with certain range of representable values. Normally, range of int is [much] wider than that of char. When you try to squeeze into a char an int value that doesn't fit into the range of char, the value will not "fit", of course. The actual result is implementation-defined.
In your case 3000 is an int value that doesn't' fit into the range of char on your implementation. So, you won't get 3000 as the result. If you really want to know why it specifically came out as -72 - consult the documentation that came with your implementation.
As specified, the 16-bit hex value of 3000 is 0x0BB8. Although implementation specific, from your posted results this is likely stored in memory in 8-bit pairs as B8 0B (some architectures would store it as 0B B8. This is known as endianness.)
char, on the other hand, is probably not a 16-bit type. Again, this is implementation specific, but from your posted results it appears to be 8-bits, which is not uncommon.
So while your program has allocated 8-bits of memory for your value, you're storing twice as much information in that memory. When your program retrieves this value later, it will only be pulling the first stored octet, in this case B8. The 0B will be ignored, and may cause problems later down the line if it ended up overwriting something important. This is known as a buffer overflow, which is very bad.
Assuming two's complement (technically implementation specific, but a reasonable assumption), the hex value of B8 translates to either -72 or 184 in decimal, depending on whether your dealing with a signed or unsigned type. Since you didn't specify either, your compiler will go with it's default. Yet again, this is implementation specific, and it appears your compiler goes with signed char.
Therefore, you get -72. But don't expect the same results on any other system.
A char is (typically) just 8 bits, so you cant store values as large as 3000 (which would require at least 11 12 bits). So if you trie to store 3000 in a byte, it will just wrap.
Since 3000 is 0xBBA, it requires two bytes, one 0x0B and one which is 0xBA. If you try to store it in a single byte, you will just get one of them (0xBA). And since a byte is (typically) signed, that is -72.
char is used to hold a single character, and you're trying to store a 4-digit int in one. Perhaps you meant to use an array of chars, or string (char t[4] in this case).
To convert an int to a string (untested):
#include <stdlib.h>
int main() {
int num = 3000;
char numString[4];
itoa(num, buf, 10);
}
oh, i get it, it's overflow, it's like char is only from -256 to 256 or something like that i'm not sure, like if you have a var which type's max limit is 256 and you add 1 to it, than it becomes -256 and so on
Related
Will the accessibility of memory space get changed or just informing the compiler take the variable of mentioned type?
Example:
int main()
{
char a;
a = 123456789;
printf("ans is %d\n",(int)a);
}
Output:
overflow in implicit constant conversion a= 123456789.
ans is 21.
Here I know why it's causing overflow. But I want to know how memory is accessed when an overflow occurs.
This is kind of simple: Since char typically only holds one byte, only a single byte of 123456789 will be copied to a. Exactly how depends on if char is signed or unsigned (it's implementation-specific which one it is). For the exact details see e.g. this integer conversion reference.
What typically happens (I haven't seen any compiler do any different) is that the last byte of the value is copied, unmodified, into a.
For 123456789, if you view the hexadecimal representation of the value it will be 0x75bcd15. Here you can easily see that the last byte is 0x15 which is 21 in decimal.
What happens with the casting to int when you print the value is actually nothing that wouldn't happen anyway... When using variable-argument functions like printf values of a smaller type than int will be promoted to an int. Your printf call is exactly equal to
printf("ans is %d\n",a);
Here is something weird I found:
When I have a char* s of three elements, and assigned it to be "21",
The printed short int value of s appears to be 12594, which is same to 0010001 0010010 in binary, and 49 50 for separate char. But according to the ASCII chart, the value of '2' is 50 and '1' is 49.
when I shift the char to right, *(short*)s >>= 8, the result is agreed with (1.), which is '1' or 49. But after I assigned the char *s = '1', the printed string of s also appears to be "1", which I earlier thought it will become "11".
I am kind of confused about how bits stored in a char now, hope someone can explain this.
Following is the code I use:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
printf("%lu,%lu\n",sizeof(char), sizeof(short));
char* s = malloc(sizeof(char)*3);
*s = '2', *(s+1) = '1', *(s+2) = '\0';
printf("%s\n",s);
printf("%d\n",*(short int*)s);
*(short*)s >>= 8;
printf("%s\n",s);
printf("%d\n",*(short int*)s);
*s = '1';
printf("%s\n",s);
return 0;
}
And the output is:
1,2
21
12594
1
49
1
This program is compiled on macOS with gcc.
You need some understanding of the concept of "endianess" here, that values can be represented as "little endian" and "big endian".
I am going to skip the discussion of how legal it is, about involved undefined bahaviour.
(Here is however a relevant link, provided by Lundin, credits:
What is the strict aliasing rule?)
But lets look at a pair of byte in memory, of which the lower-addressed contains a 50 and the higher addressed contains a 49:
50 49
You introduce them exactly this way, by explicitly setting lower byte and higher byte (via char type).
Then you read them, forcing the compiler to consider it a short, which is a two byte sized type on your system.
Compilers and hardware can be created with different "opinions" on what is a good representation of two byte values in two cosecutive bytes. It is called "endianess".
Two compilers, both of which are perfectly standard-conforming can act like this:
The short to be returned is
take the value from lower address, multiply it by 256, add the value from higher address
take the value from the higher address, multiply it by 256, add the value from the lower address
They do not actually do so, it is a much more efficient mechanism implemented in hardware, but the point is that even the implementation in hardware implicity does this or that.
You are re-interpreting representations by aliasing types in a way that is not allowed by the standard: you can process a short value as if it were a char array, but not the opposite. Doing that can cause weird errors with optimizing compilers that could assume that the value has never been initialized, or could optimize out a full branch of code that contains Undefined Behaviour.
Then the answer to your question is called endianess. In a big endian representation, the most significant byte has the lowest address (258 or 0x102 will be represented as the 2 byte 0x01, 0x02 in that order) while in little endian representation the least significant byte has the lowest address (0x102 is represented as 0x02, 0x01 in that order).
Your system happens to be a little endian one.
Its quite embarrassing but I really want to know... So I needed to make a conversion program that converts decimal(base 10) to binary and hex. I used arrays to store values and everything worked out fine, but i declared the array as int arr[1000]; because i thought 1000 was just an ok number, not too big, not to small...someone in class said " why would you declare an array of 1000? Integers are 32 bits". I was too embarrased to ask what that meant so i didnt say anything. But does this mean that i can just declare the array as int arr[32]; instead? Im using C btw
No, the int type has tipically a 32 bit size, but when you declare
int arr[1000];
you are reserving space for 1000 integers, i.e. 32'000 bits, while with
int arr[32];
you can store up to 32 integers.
You are practically asking yourself a question like this: if an apple weighs 32 grams, I want to my bag to
contain 1000 apples or 32 apples?
Don't be embarrassed. Fear is your enemy and in the end you will be perceived based on contexts that you have no hope of significantly influencing. Anyway, to answer your question, your approach is incorrect. You should declare the array with a size completely determined by the number of positions used.
Concretely, if you access the array at 87 distinct positions (from 0 to 86) then you need a size of 87.
0 to 4,294,967,295 is the maximum possible range of numbers you can store in 32 bits.If your number is outside this range you cannot store your number in 32 bits.Since each bit will occupy one index location of your array if you number falls in that range array size of 32 will do fine.for example consider number 9 it will be stored in array as a[]={1,0,0,1}.
In order to know the know range of numbers, your formula is 0 to (2^n -1) where n is the number of bits in binary. means in the array size of 4 or 4 bits you can just store number from range 0 to 15.
In C , integer datatype can store typically up to 2,147,483,647 and 4,294,967,295 if you are using unsigned integer. Since the maximum value, an integer data type can store in C is within the range of maximum possible number which can be expressed using 32 bits. It is safe to say that array size of 32 is the best size for defining an array.Sice you will never require more than 32 bits to express any number using an int.
I will use
int a = 42;
char bin[sizeof a * CHAR_BIT + 1];
char hex[sizeof a * CHAR_BIT / 4 + 1]
I think this include all possibility.
Consider that also the 'int' type is ambiguous. Generally it depends on the machine you're working on and at minimum its ranges are: -32767,+32767:
https://en.wikipedia.org/wiki/C_data_types
Can I suggest to use the stdint types?
int32_t/uint32_t
What you did is okay. If that is precisely what you want to do. C is a language that lets you do whatever you want. Whenever you want. The reason you were berated on the declaration is because of 'hogging' memory. The thought being, how DARE YOU take up space that is possibly never used... it is inefficient.
And it is. But who cares if you just want to run a program that has a simple purpose? A 1000 16 or 32 bit block of memory is weeeeeensy teeeeny tiny compared to computers from the way back times when it was necessary to watch over how much RAM you were taking up. So - go ahead.
But what they should have said next is how to avoid that. More on that at the end - but first a thing about built in data types in C.
An int can be 16 or 32 bits depending on how you declare it. And your compiler's settings...
A LONG int is 32.
consider:
short int x = 10; // declares an integer that is 16 bits
signed int x = 10; // 32 bit integer with negative and positive range
unsigned int x = 10 // same 32 bit integer - but only 0 to positive values
To specifically code a 32 bit integer you declare it 'long'
long int = 10; // 32 bit
unsigned long int = 10; // 32 bit 0 to positive values
Typical nomenclature is to call a 16 bit value a WORD and a 32 bit value a DWORD - (double word). But why would you want to type in:
long int x = 10;
instead of:
int x = 10;
?? For a few reasons. Some compilers may handle the int as a 16 bit WORD if keeping up with older standards. But the only real reason is to maintain a convention of strongly typecasted code. Make it read directly what you intend it to do. This also helps in readability. You will KNOW when you see it = what size it is for sure, and be reminded whilst coding. Many many code mishaps happen for lack of attention to code practices and naming things well. Save yourself hours of headache later on by learning good habits now. Create YOUR OWN style of coding. Take a look at other styles just to get an idea on what the industry may expect. But in the end you will find you own way in it.
On to the array issue ---> So, I expect you know that the array takes up memory right when the program runs. Right then, wham - the RAM for that array is set aside just for your program. It is locked out from use by any other resource, service, etc the operating system is handling.
But wouldn't it be neat if you could just use the memory you needed when you wanted, and then let it go when done? Inside the program - as it runs. So when your program first started, the array (so to speak) would be zero. And when you needed a 'slot' in the array, you could just add one.... use it, and then let it go - or add another - or ten more... etc.
That is called dynamic memory allocation. And it requires the use of a data type that you may not have encountered yet. Look up "Pointers in C" to get an intro.
If you are coding in regular C there are a few functions that assist in performing dynamic allocation of memory:
malloc and free ~ in the alloc.h library routines
in C++ they are implemented differently. Look for:
new and delete
A common construct for handling dynamic 'arrays' is called a "linked-list." Look that up too...
Don't let someone get your flustered with code concepts. Next time just say your program is designed to handle exactly what you have intended. That usually stops the discussion.
Atomkey
Why does int a = 'adf'; compile and run in C?
The literal 'adf' is a multi-byte character constant. Its value is platform dependent. Don't use it.
For example, one some platform a 32-bit unsigned integer could take the value 0x00616466, and on another it could be 0x66646100, and on yet another it could be 0x84860081...
This, as Kerrek said, is a multi-byte character constant. It works because each character takes up 8 bits. 'adf' is 3 characters, which is 24 bits. An int is usually large enough to contain this.
But all of the above is platform dependent, and could be different from architecture to architecture. This kind of thing is still used in ancient Apple code, can't quite remember where, although file creator codes ring a bell.
Note the difference in syntax between " and '.
char *x = "this is a string. The value assigned to x is a pointer to the string in memory"
char y = '!' // the value assigned to y is the numerical character value of the character '!'
char z = 'asd' // the value of z is the numerical value of the 'string' data, which can in theory be expressed as an int if it's short enough
It works just because "adf" is 3 ASCII characters and thus 3 bytes long and your platform is a 24 bit or larger system. It would fail on a 16bit system for instance.
Its also worth remembering that although sizeof(char) will always return 1, dependending on platform and compiler more than 1 byte of memory space could be assigned to a char hence for
struct st
{
int a;
char c;
};
when you:
sizeof(st) a number of 32 bit systems will return 8. This is because the system will pad out the single byte for char c to 4 bytes.
ASCII. Every character has a numerical value. Halfway through this tutorial is a description if you need more information http://en.wikibooks.org/wiki/C_Programming/Variables
Edit_______________________________________
char letter2 = 97; /* in ASCII, 97 = 'a' */
This is considered by some to be extremely bad practice, if we are using it to store a character, not a small number, in that if someone reads your code, most readers are forced to look up what character corresponds with the number 97 in the encoding scheme. In the end, letter1 and letter2 store both the same thing – the letter "a", but the first method is clearer, easier to debug, and much more straightforward.
One important thing to mention is that characters for numerals are represented differently from their corresponding number, i.e. '1' is not equal to 1.
I want to write signed integer values into a file in a platform independent way.
If they were unsigned, I would just convert them from host byte order to LE (or BE) with the endian(3) family of functions.
I'm not sure how to deal with signed integers though. If I cast them to unsigned values, I loose the sign, since the C standard does not guarantee that
(int) ((unsigned) -1)) == -1
The other option would be to I cast a pointer to the value (i.e., reinterpret the byte sequence as unsigned), but it I'm not convinced that converting endianness after that is going to give anything sensible.
What is the proper way for platform independent signed integer storage?
Update:
I know that in practice, almost all architectures use two-complement representation, so that I can losslessly convert between signed and unsigned integers. However, this is question is meant to be more theoretical.
Just rolling out my own integer representation (be that storing the decimal letters as ascii characters, or separately storing the sign bit) is of course a solution. However, I'm interested if there is a way that works without completely abandoning the native binary representation.
The simplest solution:
For writing, just convert to unsigned and use your unsigned endian conversion functions.
For reading the values back, first read them into an unsigned variable, and check if the high bit is set, and do some arithmetic to make the conversion well-defined:
uint32_t temp;
int32_t dest;
if (temp > INT32_MAX) dest = -(int32_t)(-temp-1)-1;
else dest = temp;
As an added bonus, a good compiler on a sane system (i.e. a twos-complement system where the implementation-defined conversion to unsigned is "correct") will first optimize -(int32_t)(-temp-1)-1 to (int32_t)temp, then optimize the two branches of the conditional, which now both contain identical code, to a single code path with no branch.
A platform-independent way? If you truly want this, you should consider writing it as text rather than binary (and taking into account that even that is not fully platform-independent since you may want to move it from an ASCII to an EBCDIC platform).
It all depends on how platform-independent you need it to be. C allows for three different signed encodings: two's complement, one's complement and sign/magnitude. But, by far, most machines will use the first one.
Work out first what you actually mean by that term. If you mean you only want to handle two's complement, then casting it to an unsigned is fine.
Use the same approach as when sending data over the network. Convert your unsigned or signed values to big-endian and save them by using htonl(). When reading, convert the data back to your machine endianness by using ntohl().
But as always you need to know if the data originally was signed or unsigned. With just a bit sequence, you can't know for sure.
Options:
Store numbers as plain text using printf()-like functions for conversion
Convert negative numbers to sign + absolute value, store them as unsigned with the extra sign bit
Output a 1 byte sign flag (e.g. 0=positive, 1=negative). If the value is negative make it positive and then write the value in big endian format. If you don't like 0 and 1 you could use '+' and '-'.
Store the sign and the absolute value as 2 fields, and recombine them when you read it back.
You said you already know how to convert to/from a well-defined byte order, so all that is left is to determine the sign (hint < 0 might help here :-)), take the absolute value (which you could do in combination with determining what it is, or using abs() or similar.
Something like:
if (num < 0) {
negative = 1;
num = -num;
} else {
negative = 0
}
write_value = htole32(num);
write(file, &negative, 1);
write(file, &write_value, 4);
As an optimization you could collect the sign bits for values together and store them in a single word before the absolute values.