Character operations in C

Character operations in C - c

When you cast a character to an int in C, what exactly is happening? Since characters are one byte and ints are four, how are you able to get an integer value for a character? Is it the bit pattern that is treated as a number. Take for example the character 'A'. Is the bit pattern 01000001 (i.e 65 in binary)?

char and int are both integer types.
When you convert a value from any arithmetic (integer or floating-point) type to another arithmetic type, the conversion preserves the value whenever possible. Arithmetic conversions are always defined in terms of values, not representations (though some of the rules are designed to be simply implemented on most hardware).
In your case, you might have:
char c = 'A';
int i = c;
c is an object of type char with the value 65 (assuming an ASCII representation). The conversion from char to int yields an int with the value 65. The compiler generates whatever code is necessary to make that happen; in terms of representation, it could either sign-extend or pad with 0 bits.
This applies when the value of the source expression can be represented as a value of the target type. For a char to int conversion, that's (almost) always going to be the case. For some other conversions, there are various rules for what to do when the value won't fit:
For any conversion to or from floating-point, if the value is out of range the behavior is undefined ((int)1.0e100 may yield some arbitrary value or it can crash your program), and if it's within range but inexact it's approximated by rounding or truncation;
For conversion of a signed or unsigned integer to an unsigned integer, the result is wrapped (unsigned)-1 == UINT_MAX);
For conversion of a signed or unsigned integer to a signed integer, the result is implementation-defined (wraparound semantics are common) -- or an implementation-defined signal can be raised.
(Floating-point conversions also have to deal with precision.)
Other than converting integers to unsigned types, you should generally avoid out-of-range conversions.
Incidentally, though int may happen to be 4 bytes on your system, it could be any size as long as it's able to represent values from -32767 to +32767. The ranges of the various integer types, and even the number of bits in a byte, are implementation-defined (with some restrictions imposed by the standard). 8-bit bytes are almost universal. 32-bit int is very common, though older systems commonly had 16-bit int (and I've worked on systems with 64-bit int).

Related

If I want to store an integer into a char type variable, which byte of the integer will be stored?

int a = 0x11223344;
char b = (char)a;
I am new to programming and learning C. Why do I get value of b here as D?

If I want to store an integer into a char type variable, which byte of the integer will be stored?
This is not fully defined by the C standard.
In the particular situation you tried it, what likely happened is that the low eight bits of 0x11223344 were stored in b, producing 4416 (6810) in b, and printing that prints “D” because your system using ASCII character codes, and 68 is the ASCII code for “D”.
However, you should be wary of something like this working, because it is contingent on several things, and variations are possible.
First, the C standard allows char to be signed or unsigned. It also allows char to be any width that is eight bits or greater. In most C implementations today, it is eight bits.
Second, the conversion from int to char depends on whether char is signed or unsigned and may not be fully defined by the C standard.
If char is unsigned, then the conversion is defined to wrap modulo M+1, where M is the largest value representable in char. Effectively, this is the same as taking the low byte of the value. If the unsigned char has eight bits, its M is 255, so M+1 is 256.
If char is signed and the value is out of range of the char type, the conversion is implementation-defined: It may either trap or produce an implementation-defined value. Your C implementation may wrap conversions to signed integer types similarly to how it wraps conversions to unsigned types, but another reasonable behavior is to “clamp” out-of-range values to the limits of the type, CHAR_MIN and CHAR_MAX. For example, converting −8000 to char could yield the minimum, −128, while converting 0x11223344 to char could yield the maximum, +127.
Third, the C standard does not require implementations to use ASCII. It is very common to use ASCII. (Usually, the character encoding is not just ASCII, because ASCII covers only values from 0 to 127. C implementations often use some extension beyond ASCII for values from 128 to 255.)

assigning 128 to char variable in c

The output comes to be the 32-bit 2's complement of 128 that is 4294967168. How?
#include <stdio.h>
int main()
{
char a;
a=128;
if(a==-128)
{
printf("%u\n",a);
}
return 0;
}

Compiling your code with warnings turned on gives:
warning: overflow in conversion from 'int' to 'char' changes value from '128' to '-128' [-Woverflow]
which tell you that the assignment a=128; isn't well defined on your plat form.
The standard say:
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-deﬁned or an implementation-deﬁned signal is raised.
So we can't know what is going on as it depends on your system.
However, if we do some guessing (and note this is just a guess):
128 as 8 bit would be 0b1000.0000
so when you call printf where you get a conversion to int there will be a sign extension like:
0b1000.0000 ==> 0b1111.1111.1111.1111.1111.1111.1000.0000
which - printed as unsigned represents the number 4294967168

The sequence of steps that got you there is something like this:
You assign 128 to a char.
On your implementation, char is signed char and has a maximum value of 127, so 128 overflows.
Your implementation interprets 128 as 0x80. It uses two’s-complement math, so (int8_t)0x80 represents (int8_t)-128.
For historical reasons (relating to the instruction sets of the DEC PDP minicomputers on which C was originally developed), C promotes signed types shorter than int to int in many contexts, including variadic arguments to functions such as printf(), which aren’t bound to a prototype and still use the old argument-promotion rules of K&R C instead.
On your implementation, int is 32 bits wide and also two’s-complement, so (int)-128 sign-extends to 0xFFFFFF80.
When you make a call like printf("%u", x), the runtime interprets the int argument as an unsigned int.
As an unsigned 32-bit integer, 0xFFFFFF80 represents 4,294,967,168.
The "%u\n" format specifier prints this out without commas (or other separators) followed by a newline.
This is all legal, but so are many other possible results. The code is buggy and not portable.
Make sure you don’t overflow the range of your type! (Or if that’s unavoidable, overflow for unsigned scalars is defined as modular arithmetic, so it’s better-behaved.) The workaround here is to use unsigned char, which has a range from 0 to (at least) 255, instead of char.

First of all, as I hope you understand, the code you've posted is full of errors, and you would not want to depend on its output. If you were trying to perform any of these manipulations in a real program, you would want to do so in some other, more well-defined, more portable way.
So I assume you're asking only out of curiosity, and I answer in the same spirit.
Type char on your machine is probably a signed 8-bit quantity. So its range is from -128 to +127. So +128 won't fit.
When you try to jam the value +128 into a signed 8-bit quantity, you probably end up with the value -128 instead. And that seems to be what's happening for you, based on the fact that your if statement is evidently succeeding.
So next we try to take the value -128 and print it as if it was an unsigned int, which on your machine is evidently an 32-bit type. It can hold numbers in the range 0 to 4294967295, which obviously does not include -128. But unsigned integers typically behave pretty nicely modulo their range, so if we add 4294967296 to -128 we get 4294967168, which is precisely the number you saw.
Now that we've worked through this, let's resolve in future not to jam numbers that won't fit into char variables, or to print signed quantities with the %u format specifier.

About integer constants in the book "C: A reference manual"

In section 2.7.1 Integer constants, it says:
To illustrate some of the subtleties of integer constants, assume that
type int uses a 16-bit twos-complement representation, type long uses
a 32-bit twos-complement representation, and type long long uses a
64-bit twos-complement representation. We list in Table 2-6 some
interesting integer constants...
An interesting point to note from this table is that integers in the
range 2^15 through 2^16 - 1 will have positive values when written as
decimal constants but negative values when written as octal or
hexadecimal constants (and cast to type int).
But, as far as I know, integers in the range 2^15 - 2^16-1 written as hex/octal constants also have positive values when cast to type unsigned. Is the book wrong?

In the described setup, decimal literals in the range [32768,65535] have type long int, and hexadecimal literals in that range have type unsigned int.
So, the constant 0xFFFF is an unsigned int with value 65535, and the constant 65535 is a signed long int with value 65535.
I think your text is trying to discuss the cases:
(int)0xFFFF
(int)65535
Now, since int cannot represent the value 65535 both of these cause out-of-range conversion which is implementation-defined (or may raise an implementation-defined signal).
Most commonly (in fact, all 2's complement systems I've ever heard of), it will use a combination of truncation and reinterpretation in both of those cases, giving a value of -1.
So the last paragraph of your quote is a bit strange. 65535 and 0xFFFF are both large positive numbers; (int)0xFFFF and (int)65535 are (probably) both negative numbers; but if you cast one and don't cast the other then you get a discrepancy which is not surprising.

Cyclic Nature of char datatype [duplicate]

This question already has answers here:
Simple Character Interpretation In C
(9 answers)
Closed 5 years ago.
I had been learning C and came across topic called Cyclic Nature of Data Type in C.
It is like example
char c=125;
c=c+10;
printf("%d",c);
The output is -121.
the logic given was
125+1= 126
125+2= 127
125+3=-128
125+4=-127
125+5=-126
125+6=-125
125+7=-124
125+8=-123
125+9=-122
125+10=-121
This is due to cyclic nature if char datatype. Y does Char exhibit cyclic nature?? How is it possible to char??

On your system char is signed char. When a signed integral type overflows, the result is undefined. It may or may not be cyclic. Although on most of the machines performing 2's complement arithmetic, you may observe this as cyclic.

The char data type is signed type, as per your implementation. As such it can store values in range: -128 to 127. When you store a value greater than 127, you would end up with a value that might be in negative or positive number, depending on how large is the value stored and what kind of platform you are working on.
Signed integer overflow is undefined behavior in C and then not at all unsigned numbers are guaranteed to wrap around.

char isn’t special in this regard (besides its implementation-defined signedness), all conversions to signed types usually exhibit this “cyclic nature”. However, there are undefined and implementation-defined aspects of signed overflow, so be careful when doing such things.
What happens here:
In the expression
c=c+10
the operands of + are subject to the usual arithmetic conversions. They include integer promotion, which converts all values to int if all values of their type can be represented as an int. This means, the left operand of + (c) is converted to an int (an int can hold every char value1)). The result of the addition has type int. The assignment implicitly converts this value to a char, which happens to be signed on your platform. An (8-bit) signed char cannot hold the value 135 so it is converted in an implementation-defined way 2). For gcc:
For conversion to a type of width N, the value is reduced modulo 2N to be within range of the type; no signal is raised.
Your char has a width of 8, 28 is 256, and 135 ☰ -121 mod 256 (cf. e.g. 2’s complement on Wikipedia).
You didn’t say which compiler you use, but the behaviour should be the same for all compilers (there aren’t really any non-2’s-complement machines anymore and with 2’s complement, that’s the only reasonable signed conversion definition I can think of).
Note, that this implementation-defined behaviour only applies to conversions, not to overflows in arbitrary expressions, so e.g.
int n = INT_MAX;
n += 1;
is undefined behaviour and used for optimizations by some compilers (e.g. by optimizing such statements out), so such things should definitely be avoided.
A third case (unrelated here, but for sake of completeness) are unsigned integer types: No overflow occurs (there are exceptions, however, e.g. bit-shifting by more than the width of the type), the result is always reduced modulo 2N for a type with precision N.
Related:
A simple C program output is not as expected
Allowing signed integer overflows in C/C++
1 At least for 8-bit chars, signed chars, or ints with higher precision than char, so virtually always.
2 The C standard says (C99 and C11 (n1570) 6.3.1.3 p.3) “[…] either the result is implementation-defined or an implementation-defined signal is raised.” I don’t know of any implementation raising a signal in this case. But it’s probably better not to rely on that conversion without reading the compiler documentation.

Typecasting a negative number to positive in C

I wouldn't expect the value that gets printed to be the initial negative value. Is there something I'm missing for type casting?
#include<stdint.h>
int main() {
int32_t color = -2451337;
uint32_t color2 = (uint32_t)color;
printf("%d", (uint32_t)color2);
return 0;
}

int32_t color = -2451337;
uint32_t color2 = (uint32_t)color;
The cast is unnecessary; if you omit it, exactly the same conversion will be done implicitly.
For any conversion between two numeric types, if the value is representable in both types, the conversion preserves the value. But since color is negative, that's not the case here.
For conversion from a signed integer type to an unsigned integer type, the result is implementation-defined (or it can raise an implementation-defined signal, but I don't know of any compiler that does that).
Under most compilers, conversions between integer types of the same size just copies or reinterprets the bits making up the representation. The standard requires int32_t to use two's-complement representation, so if the conversion just copies the bits, then the result will be 4292515959.
(Other results are permitted by the C standard, but not likely to be implemented by real-world compilers. The standard permits one's-complement and sign-and magnitude representations for signed integer types, but specifically requires int32_t to use two's-complement; a C compiler for a one's complement CPU probably just would't define int32_t.)
printf("%d", (uint32_t)color2);
Again, the cast is unnecessary, since color2 is already of type uint32_t. But the "%d" format requires an argument of type int, which is a signed type (that may be as narrow as 16 bits). In this case, the uint32_t value isn't converted to int. Most likely the representation of color2 will be treated as if it were an int object, but the behavior is undefined, so as far as the C standard is concerned quite literally anything could happen.
To print a uint32_t value, you can use the PRId32 macro defined in <inttypes.h>:
printf("%" PRId32, color32);
Or, perhaps more simply, you can convert it to the widest unsigned integer type and use "%ju":
printf("%ju", (uintmax_t)color32);
This will print the implementation-defined value (probably 4292515959) of color32.
And you should add a newline \n to the format string.
More quibbles:
You're missing #include <stdio.h>, which is required if you call printf.
int main() is ok, but int main(void) is preferred.

You took a bunch of bits (stored in signed value). You then told the CPU to interpret that bunch of bits as unsigned. You then told the cpu to render the same bunch of bits as signed again (%d). You would therefore see the same as you first entered.
C just deals in bunches of bits. If the value you had chosen was near the representational limit of the type(s) involved (read up on twos-complement representation), then we might see some funky effects, but the value you happened to choose wasn't. So you got back what you put in.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight