What does this piece of code print infinite number of times? - c

I would like to know why this piece of code printf char c infinitely:
char c;
for (c = 0; c< 256; c++)
printf("Char c=%c\n", c);
Thanks

Assuming CHAR_BIT is 8 (which it almost always is), a char object cannot represent a value of 256; signed char will max out at 127, while unsigned char will max out at 255.
Signed case
When c is 127 and you add 1, the value "overflows". Because one's-complement and sign-magnitude representations are still a thing on some oddball architectures, the exact result can vary. For two's-complement, it will "wrap around" to -128, whereas for one's-complement it wraps around to -127, and for sign-magnitude it becomes -0.
All of these values are less than 256, so the condition c < 256 is always true.
Because the result can vary based on the platform, the C language definition places no requirements on the compiler to handle signed integer overflow in any particular way - the behavior is left undefined, and any result is equally "correct" as far as the language is concerned. A reasonably smart compiler might be able to detect that the condition will always evaluate to true and issue a warning, and it would be free to do that as far as the language definition is concerned. Or not.
Unsigned case
Unlike the signed case, unsigned integer overflow is well-defined; if c is 255 and you add 1, then the result wraps around back to 0. Again, 0 is less than 256, so c < 256 will always be true.

Because the char data type is only one byte long, and therefore can only hold the values 0-255.
So when c is 255 and you do c++, it becomes 0. Thus, c is always < 256 in the loop.

Related

Negative power of 2

I am learning the characteristics of the different data type. For example, this program increasingly prints the power of 2 with four different formats: integer, unsigned integer, hexadecimal, octal
#include<stdio.h>
int main(int argc, char *argv[]){
int i, val = 1;
for (i = 1; i < 35; ++i) {
printf("%15d%15u%15x%15o\n", val, val, val, val);
val *= 2;
}
return 0;
}
It works. unsigned goes up to 2147483648. integer goes up to -2147483648. But why does it become negative?
I have a theory: is it because the maximum signed integer we can represent on a 32 bit machine is 2147483647? If so, why does it return the negative number?
First of all, you should understand that this program is undefined. It causes signed integer overflow, and this is declared undefined in the C Standard.
The technical reason is that no behavior can be predicted as different representations are allowed for negative numbers and there could even be padding bits in the representation.
The most probable reason you see a negative number in your case is that your machine uses 2's complement (look it up) to represent negative numbers while arithmetics operate on bits without overflow checks. Therefore, the highest bit is the sign bit, and if your value overflows into this bit, it turns negative.
What you describe is UB caused by integer overflow. Since the behavior is undefined, anything could happen (“When the compiler encounters [a given undefined construct] it is legal for it to make demons fly out of your nose”), BUT, what actually happens on some machines (I suspect yours included) is this:
You start with int val = 1;. That is represented 0b00...1 in binary form. Each time you val *= 2; the value is multiplied by 2, therefore the representation changes to 0b00...10 and then to 0b00...100 and so on (the 1 bit moves one step each time). The last time you val *= 2; you get 0b100.... Now, using 2's complement (which is what I guess your machine uses, as it very common) the value is actually -1 * 0b1000... which is -2147483648
Note, that even though this might be what's really going on in your machine, it's not to be trusted or thought of as the "right" thing to happen, since, as mentioned before, this is UB
In this program, the val value will overflow, if it is a 32- bit machine, because the size of integer is 4 bytes. Now, we have 2 type of values in math, positive and negative, so to do calculation involving negative results, we use sign representations i.e int or char in C language.
Lets take the example of char, range -128 to 127, unsigned char range 0-255 .
It tells, range is divided into two parts for signed representation. So for any signed variable, if it crosses its range of +ve value, it goes into negative value. Like here in case of char, as the value goes above the 127, it becomes -ve. And suppose if you add 300 to any char or unsigned char variable what happens, it rolls over and starts again from zero.
char a=2;
a+=300;
what is the value?? now you know max value of char is 255(total 256 values, including zero), so 300-256 = 44 + 2 =46.
Hope this helps

C strcmp implementation using subtraction of characters

I saw this implementation of strcmp a while back, and I have a question for purely education purposes. Why is it needed to convert the inputs to 16bit integers, do the math and then convert back to 8bit? What is wrong with doing the subtraction in 8bit?
int8_t strcmp (const uint8_t* s1, const uint8_t* s2)
{
while ( *s1 && (*s1 == *s2) )
{
s1++;
s2++;
}
return (int8_t)( (int16_t)*s1 - (int16_t)*s2 );
}
Note: the code assumes 16 bit int type.
EDIT:
It was mentioned that C does conversion to int (suppose 32bit) by default. Is that the case even when the code explicitly states to cast to 16bit int ?
The strcmp(a,b) function is expected to return
<0 if string a < string b
>0 if string a > string b
0 if string a == string b
The test is actually made on the first char being different in the two strings at the same location (0, the string terminator, works as well).
Here since the function takes two uint8_t (unsigned char), the developer was probably worrying about doing a comparison on two unsigned chars would give a number between 0 and 255, hence a negative value would never be returned. For instance, 118 - 236 would return -118, but on 8 bits it would return 138.
Thus the programmer decided to cast to int_16, signed integer (16 bits).
That could have worked, and given the correct negative/positive values (provided that the function returns int_16 instead of int_8).
(*edit: comment from #zwol below, the integer promotion is unavoidable, thus this int16_t casting is not necessary)
However the final int_8 cast breaks the logic. Since returned values may be from -255 to 255, some of these values will see their sign reversed after the cast to int_8.
For instance, doing 255 - 0 gives the positive 255 (on 16 bits, all lower 8 bits to 1, MSB to 0) but in the int_8 world (signed int of 8 bits) this is negative, -1, since we only have the last low 8 bits set to binary 11111111, or decimal -1.
Definitely not a good programming example.
That working function from Apple is better
for ( ; *s1 == *s2; s1++, s2++)
if (*s1 == '\0')
return 0;
return ((*(unsigned char *)s1 < *(unsigned char *)s2) ? -1 : +1);
(Linux does it in assembly code...)
Actually, the difference must be done in at least 16 bits¹ for the obvious reason that the range of the result is -255 to 255 and that does not fit in 8 bits. However, sfstewman is correct in noting that it would happen due to implicit integer promotion anyway.
The eventual cast to 8 bits is incorrect, because it can overflow as the range still does not fit in 8 bits. And anyway, strcmp is indeed supposed to return plain int.
¹ 9 would suffice, but bits normally come in batches of 8.
Input data is unsigned 8-bit, so to avoid truncation and effects of overflow/underflow it should be converted to at least 9-bit signed, therefore int16 is used.
return (int8_t)( (int16_t)*s1 - (int16_t)*s2 );
This could mean one of these two options:
Either the programmer was confused about how implicit type promotions work in C. Both operands will be implicitly converted to int no matter the casts to int16_t. So if intis for example 32 bits, the code is nonsense. Or otherwise if int is equivalent to int16_t for the specific system - then no conversion at all takes place.
Or the programmer is well-aware about how type promotions work and is writing code that needs to confirm to a standard that bans implicit type promotions, such as MISRA-C. In that case, and in case int is 16 bits on the given system, the code makes perfect sense: it forces an explicit type promotion to dodge warnings from the compiler/static analyser.
I would make a guess that the second option is the most likely, and that this code is indended for a small microcontroller system.
There are certain values that would cause the difference between the two numbers to be different if the int16_t weren't there due to overflow. In an int8_t your range is -128 to 127, in a uint8_t your range is 0 to 255, and in a int16_t your range would be -32,768 to 32,767.
Casing to an int8_t from a uint8_t will cause values over 127 to change signs due to overflow so this keeps that from happening, however the output should be an int16_t due to if you had a 255 - 0 result, it would be a truncated return.

Is arithmetic overflow equivalent to modulo operation?

I need to do modulo 256 arithmetic in C. So can I simply do
unsigned char i;
i++;
instead of
int i;
i=(i+1)%256;
No. There is nothing that guarantees that unsigned char has eight bits. Use uint8_t from <stdint.h>, and you'll be perfectly fine. This requires an implementation which supports stdint.h: any C99 compliant compiler does, but older compilers may not provide it.
Note: unsigned arithmetic never overflows, and behaves as "modulo 2^n". Signed arithmetic overflows with undefined behavior.
Yes, the behavior of both of your examples is the same. See C99 6.2.5 §9 :
A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type.
unsigned char c = UCHAR_MAX;
c++;
Basically yes, there is no overflow, but not because c is of an unsigned type. There is a hidden promotion of c to int here and an integer conversion from int to unsigned char and it is perfectly defined.
For example,
signed char c = SCHAR_MAX;
c++;
is also not undefined behavior, because it is actually equivalent to:
c = (int) c + 1;
and the conversion from int to signed char is implementation-defined here (see c99, 6.3.1.3p3 on integer conversions). To simplify CHAR_BIT == 8 is assumed.
For more information on the example above, I suggest to read this post:
"The Little C Function From Hell"
http://blog.regehr.org/archives/482
Very probably yes, but the reasons for it in this case are actually fairly complicated.
unsigned char i = 255;
i++;
The i++ is equivalent to i = i + 1.
(Well, almost. i++ yields the value of i before it was incremented, so it's really equivalent to (tmp=i; i = i + 1; tmp). But since the result is discarded in this case, that doesn't raise any additional issues.)
Since unsigned char is a narrow type, an unsigned char operand to the + operator is promoted to int (assuming int can hold all possible values in the range of unsigned char). So if i == 255, and UCHAR_MAX == 255, then the result of the addition is 256, and is of type (signed) int.
The assignment implicitly converts the value 256 from int back to unsigned char. Conversion to an unsigned type is well defined; the result is reduced modulo MAX+1, where MAX is the maximum value of the target unsigned type.
If i were declared as an unsigned int:
unsigned int i = UINT_MAX;
i++;
there would be no type conversion, but the semantics of the + operator for unsigned types also specify reduction module MAX+1.
Keep in mind that the value assigned to i is mathematically equivalent to (i+1) % UCHAR_MAX. UCHAR_MAX is usually 255, and is guaranteed to be at least 255, but it can legally be bigger.
There could be an exotic system on which UCHAR_MAX is too be to be stored in a signed int object. This would require UCHAR_MAX > INT_MAX, which means the system would have to have at least 16-bit bytes. On such a system, the promotion would be from unsigned char to unsigned int. The final result would be the same. You're not likely to encounter such a system. I think there are C implementations for some DSPs that have bytes bigger than 8 bits. The number of bits in a byte is specified by CHAR_BIT, defined in <limits.h>.
CHAR_BIT > 8 does not necessarily imply UCHAR_MAX > INT_MAX. For example, you could have CHAR_BIT == 16 and sizeof (int) == 2 i.e., 16-bit bytes and 32 bit ints).
There's another alternative that hasn't been mentioned, if you don't want to use another data type.
unsigned int i;
// ...
i = (i+1) & 0xFF; // 0xFF == 255
This works because the modulo element == 2^n, meaning the range will be [0, 2^n-1] and thus a bitmask will easily keep the value within your desired range. It's possible this method would not be much or any less efficient than the unsigned char/uint8_t version, either, depending on what magic your compiler does behind the scenes and how the targeted system handles non-word loads (for example, some RISC architectures require additional operations to load non-word-size values). This also assumes that your compiler won't detect the usage of power-of-two modulo arithmetic on unsigned values and substitute a bitmask for you, of course, as in cases like that the modulo usage would have greater semantic value (though using that as the basis for your decision is not exactly portable, of course).
An advantage of this method is that you can use it for powers of two that are not also the size of a data type, e.g.
i = (i+1) & 0x1FF; // i %= 512
i = (i+1) & 0x3FF; // i %= 1024
// etc.
This should work fine because it should just overflow back to 0. As was pointed out in a comment on a different answer, you should only do this when the value is unsigned, as you may get undefined behavior with a signed value.
It is probably best to leave this using modulo, however, because the code will be better understood by other people maintaining the code, and a smart compiler may be doing this optimization anyway, which may make it pointless in the first place. Besides, the performance difference will probably be so small that it wouldn't matter in the first place.
It will work if the number of bits that you are using to represent the number is equal to number of bits in binary (unsigned) representation (100000000) of the divisor -1
which in this case is : 9-1= 8 (char)

ansi-c converting char to int representable by ascii

hi i am interested in those chars which are representable by ascii table. for that reason i am doing the following:
int t(char c) { return (int) c; }
...
if(!(t(d)>255)) { dostuff(); }
so i am interested in only ascii table representable chars, which i assume after conversion to int should be less than 256, am i right? thanks!
Usually (not always) a char is 8-bits so all chars would typically have a value of less than 256. So your test would always succeed.
Also, ASCII only goes up to 127, not 255. The characters after that are not standard ASCII, and can vary depending on code pages.
If you are dealing with international characters you should probably use wide characters instead of char.
Use the library:
#include <ctype.h>
...
if (isascii(d)) { dostuff(); }
Two caveats:
The C standard does not decide if char is by default signed or unsigned. If your compiler treated char as signed by default the cast to int could result in negative values instead of the values from 128 to 255 (and this is assuming that your chars are 8-bit, too). Perhaps it's better to use unsigned char if you want to be sure this range will be converted the way you expect.
Technically ASCII is from 0 to 127, everything above is some kind of extension.
char is an integral type in C. You can do the check directly:
char c;
/* assign to c */
if (c >= 0 && c <= 127) {
/* in ASCII range */
}
I am assuming you don't want to use isascii() (it's not in the C standard, although it is POSIX).
Also, you can check if CHAR_MAX is equal to 127. If it is, you don't need the comparison with 127, since c will not exceed it by definition. Similarly, if CHAR_MIN is 0, then you don't need the comparison with 0. Both CHAR_MIN and CHAR_MAX are defined in limits.h.
I think you're thinking about an integer value overflowing a char, and therefore convert it to an int. But, that doesn't help with overflow since the damage has already been done.
Size of char is always 1 byte (as per standard). For all practical matters this means that a char var cannot have a value bigger than 255. (though there are systems, where a byte has more than 8 bits and thus a char value can be bigger, but these are rare nowadays)
Additional caveat is that if char is not defined as signed or unsigned, so it can be in the -128 to 127 range or the 0 to 255 range. (assuming 8 bits per byte, of course :-))
Meanwhile, the ASCII table is 7-bit, which means it covers the range of 0 to 127. So if you are interested in only ASCII symbols, you can just check if the value of your char var is in that range. No need to cast for the comparison.

What does it mean for a char to be signed?

Given that signed and unsigned ints use the same registers, etc., and just interpret bit patterns differently, and C chars are basically just 8-bit ints, what's the difference between signed and unsigned chars in C? I understand that the signedness of char is implementation defined, and I simply can't understand how it could ever make a difference, at least when char is used to hold strings instead of to do math.
It won't make a difference for strings. But in C you can use a char to do math, when it will make a difference.
In fact, when working in constrained memory environments, like embedded 8 bit applications a char will often be used to do math, and then it makes a big difference. This is because there is no byte type by default in C.
In terms of the values they represent:
unsigned char:
spans the value range 0..255 (00000000..11111111)
values overflow around low edge as:
0 - 1 = 255 (00000000 - 00000001 = 11111111)
values overflow around high edge as:
255 + 1 = 0 (11111111 + 00000001 = 00000000)
bitwise right shift operator (>>) does a logical shift:
10000000 >> 1 = 01000000 (128 / 2 = 64)
signed char:
spans the value range -128..127 (10000000..01111111)
values overflow around low edge as:
-128 - 1 = 127 (10000000 - 00000001 = 01111111)
values overflow around high edge as:
127 + 1 = -128 (01111111 + 00000001 = 10000000)
bitwise right shift operator (>>) does an arithmetic shift:
10000000 >> 1 = 11000000 (-128 / 2 = -64)
I included the binary representations to show that the value wrapping behaviour is pure, consistent binary arithmetic and has nothing to do with a char being signed/unsigned (expect for right shifts).
Update
Some implementation-specific behaviour mentioned in the comments:
char != signed char. The type "char" without "signed" or "unsinged" is implementation-defined which means that it can act like a signed or unsigned type.
Signed integer overflow leads to undefined behavior where a program can do anything, including dumping core or overrunning a buffer.
#include <stdio.h>
int main(int argc, char** argv)
{
char a = 'A';
char b = 0xFF;
signed char sa = 'A';
signed char sb = 0xFF;
unsigned char ua = 'A';
unsigned char ub = 0xFF;
printf("a > b: %s\n", a > b ? "true" : "false");
printf("sa > sb: %s\n", sa > sb ? "true" : "false");
printf("ua > ub: %s\n", ua > ub ? "true" : "false");
return 0;
}
[root]# ./a.out
a > b: true
sa > sb: true
ua > ub: false
It's important when sorting strings.
There are a couple of difference. Most importantly, if you overflow the valid range of a char by assigning it a too big or small integer, and char is signed, the resulting value is implementation defined or even some signal (in C) could be risen, as for all signed types. Contrast that to the case when you assign something too big or small to an unsigned char: the value wraps around, you will get precisely defined semantics. For example, assigning a -1 to an unsigned char, you will get an UCHAR_MAX. So whenever you have a byte as in a number from 0 to 2^CHAR_BIT, you should really use unsigned char to store it.
The sign also makes a difference when passing to vararg functions:
char c = getSomeCharacter(); // returns 0..255
printf("%d\n", c);
Assume the value assigned to c would be too big for char to represent, and the machine uses two's complement. Many implementation behave for the case that you assign a too big value to the char, in that the bit-pattern won't change. If an int will be able to represent all values of char (which it is for most implementations), then the char is being promoted to int before passing to printf. So, the value of what is passed would be negative. Promoting to int would retain that sign. So you will get a negative result. However, if char is unsigned, then the value is unsigned, and promoting to an int will yield a positive int. You can use unsigned char, then you will get precisely defined behavior for both the assignment to the variable, and passing to printf which will then print something positive.
Note that a char, unsigned and signed char all are at least 8 bits wide. There is no requirement that char is exactly 8 bits wide. However, for most systems that's true, but for some, you will find they use 32bit chars. A byte in C and C++ is defined to have the size of char, so a byte in C also is not always exactly 8 bits.
Another difference is, that in C, a unsigned char must have no padding bits. That is, if you find CHAR_BIT is 8, then an unsigned char's values must range from 0 .. 2^CHAR_BIT-1. THe same is true for char if it's unsigned. For signed char, you can't assume anything about the range of values, even if you know how your compiler implements the sign stuff (two's complement or the other options), there may be unused padding bits in it. In C++, there are no padding bits for all three character types.
"What does it mean for a char to be signed?"
Traditionally, the ASCII character set consists of 7-bit character encodings. (As opposed to the 8 bit EBCIDIC.)
When the C language was designed and implemented this was a significant issue. (For various reasons like data transmission over serial modem devices.) The extra bit has uses like parity.
A "signed character" happens to be perfect for this representation.
Binary data, OTOH, is simply taking the value of each 8-bit "chunk" of data, thus no sign is needed.
Arithmetic on bytes is important for computer graphics (where 8-bit values are often used to store colors). Aside from that, I can think of two main cases where char sign matters:
converting to a larger int
comparison functions
The nasty thing is, these won't bite you if all your string data is 7-bit. However, it promises to be an unending source of obscure bugs if you're trying to make your C/C++ program 8-bit clean.
Signedness works pretty much the same way in chars as it does in other integral types. As you've noted, chars are really just one-byte integers. (Not necessarily 8-bit, though! There's a difference; a byte might be bigger than 8 bits on some platforms, and chars are rather tied to bytes due to the definitions of char and sizeof(char). The CHAR_BIT macro, defined in <limits.h> or C++'s <climits>, will tell you how many bits are in a char.).
As for why you'd want a character with a sign: in C and C++, there is no standard type called byte. To the compiler, chars are bytes and vice versa, and it doesn't distinguish between them. Sometimes, though, you want to -- sometimes you want that char to be a one-byte number, and in those cases (particularly how small a range a byte can have), you also typically care whether the number is signed or not. I've personally used signedness (or unsignedness) to say that a certain char is a (numeric) "byte" rather than a character, and that it's going to be used numerically. Without a specified signedness, that char really is a character, and is intended to be used as text.
I used to do that, rather. Now the newer versions of C and C++ have (u?)int_least8_t (currently typedef'd in <stdint.h> or <cstdint>), which are more explicitly numeric (though they'll typically just be typedefs for signed and unsigned char types anyway).
The only situation I can imagine this being an issue is if you choose to do math on chars. It's perfectly legal to write the following code.
char a = (char)42;
char b = (char)120;
char c = a + b;
Depending on the signedness of the char, c could be one of two values. If char's are unsigned then c will be (char)162. If they are signed then it will an overflow case as the max value for a signed char is 128. I'm guessing most implementations would just return (char)-32.
One thing about signed chars is that you can test c >= ' ' (space) and be sure it's a normal printable ascii char. Of course, it's not portable, so not very useful.

Resources