Collateral effect using the sprintf function - c

How come when I use the sprintf function somehow the variable A value changed?
#include <stdio.h>
int main(void) {
short int A = 8000;
char byte_1[2] /*0001 1111 01000 0000*/, total[4];
sprintf(byte_1, "%i", A);
printf("%s\n", byte_1);// displayed on the screen 8000
printf("%i\n", A); // displayed on the screen 12336
}

byte_1 is too short to receive the representation of A in decimal: it only has space for 1 digit and the null terminator and sprintf does not have this information, so it will attempt to write beyond the end of the byte_1 array, causing undefined behavior.
make byte_1 larger, 12 bytes is a good start.
sprintf is inherenty unsafe. Use snprintf that protects against buffer overrun:
snprintf(byte_1, sizeof byte_1, "%i", A);
Here is a potential explanation for this unexpected output: imagine byte_1 is located in memory just before A. sprintf converts the value of A to five characters '8', '0', '0', '0' and '\0' that overflows the end of byte_1, and overwrites the value of variable A itself. When you later print the value of A with printf, A no longer has value 8000, but rather 12336... Just one of an infinite range of possible effects of undefined behavior.
Try this corrected version:
#include <stdio.h>
int main(void) {
short int A = 8000;
char byte_1[12], total[4];
snprintf(byte_1, sizeof byte_1, "%i", A);
printf("%s\n", byte_1);
printf("%i\n", A);
return 0;
}

The text representation of the value stored in A is ”8000” - that’s four characters plus the string terminator, so byte_1 needs to be at least 5 characters wide. If you want byte_1 to store the representation of any unsigned int, you should make it more like 12 characters wide:
char byte_1[12];
Two characters is not enough to store the string ”8000”, so whensprintf writes to byte_1, those extra characters are most likely overwriting A.
Also note that the correct conversion specifier for an unsigned int is %u, not %i. This will matter when trying to format very large unsigned values where the most significant bit is set. %i will attempt to format that as a negative signed value.
Edit
As chrqlie pointed out, the OP had declared A as short int - for some reason, another answer had changed that to unsigned int and that stuck in my head. Strictly speaking, the correct conversion specifier for a short int is %hd if you want signed decimal output.
For the record, here's a list of some common conversion specifiers and their associated types:
Specifier Argument type Output
--------- ------------- ------
i,d int Signed decimal integer
u unsigned int Unsigned decimal integer
x,X unsigned int Unsigned hexadecimal integer
o unsigned int Unsigned octal integer
f float, double Signed decimal float
s char * Text string
c char Single character
p void * Pointer value, implementation-defined
For short and long types, there are some length modifiers:
Specifier Argument type Output
--------- ------------- ------
hd short signed decimal integer
hhd char signed decimal integer
ld long signed decimal integer
lld long long signed decimal integer
Those same modifiers can be applied to u, x, X, o, etc.

byte_1 is too small for the four digits of "A". It only has enough room for a single digit, and the null (\0) terminator. If you make byte_1 an array of 5 bytes, one for each digit and the null byte, it will be able to fit "A".
#include <stdio.h>
int main(void) {
unsigned int A = 8000;
char byte_1[5], total[4];
sprintf(byte_1, "%i", A);
printf("%s\n", byte_1);
printf("%i\n", A);
return 0;
}
Basically, messing around with memory and trying to put values into variables that are too small for them, is undefined behavior. This is legal but objectively dangerous in C, and no program should be accessing memory like this.

sprintf(byte_1, "%i", A);
Format specifier needs to agree to the variable type.
I suggest the following change:
sprintf(byte_1, "%c", A);
printf("%c\n", byte_1);
EDIT: So an additional change after performing the change above, is to also change A so it is of the same type as byte_1. This will force you to change the value in your example to match the range of char types. Notice that using a function to protect you for overflowing is just a bad solution. Instead, it is your responsibility as a designer of this code to choose the proper tools for the job. When working with char variables, you need to use char-like containers. Same goes with integers, floats, strings, etc. If you have a 1 kilogram of sugar, you want to use a 1kg container to hold this amount. You wouldn't use a cup (250g) as, as you see, it overflows. Happy codding in C!

Related

How does printing 577 with %c output "A"?

#include<stdio.h>
int main()
{
int i = 577;
printf("%c",i);
return 0;
}
After compiling, its giving output "A". Can anyone explain how i'm getting this?
%c will only accept values up to 255 included, then it will start from 0 again !
577 % 256 = 65; // (char code for 'A')
This has to do with how the value is converted.
The %c format specifier expects an int argument and then converts it to type unsigned char. The character for the resulting unsigned char is then written.
Section 7.21.6.1p8 of the C standard regarding format specifiers for printf states the following regarding c:
If no l length modifier is present, the int argument is converted to an
unsigned char, and the resulting character is written.
When converting a value to a smaller unsigned type, what effectively happens is that the higher order bytes are truncated and the lower order bytes have the resulting value.
Section 6.3.1.3p2 regarding integer conversions states:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
Which, when two's complement representation is used, is the same as truncating the high-order bytes.
For the int value 577, whose value in hexadecimal is 0x241, the low order byte is 0x41 or decimal 65. In ASCII this code is the character A which is what is printed.
How does printing 577 with %c output "A"?
With printf(). "%c" matches an int argument*1. The int value is converted to an unsigned char value of 65 and the corresponding character*2, 'A' is then printed.
This makes no difference if a char is signed or unsigned or encoded with 2's complement or not. There is no undefined behavior (UB). It makes no difference how the argument is passed, on the stack, register, or .... The endian of int is irrelevant. The argument value is converted to an unsigned char and the corresponding character is printed.
*1All int values are allowed [INT_MIN...INT_MAX].
When a char value is passed as ... argument, it is first converted to an int and then passed.
char ch = 'A';
printf("%c", ch); // ch is converted to an `int` and passed to printf().
*2 65 is an ASCII A, the ubiquitous encoding of characters. Rarely other encodings are used.
Just output the value of the variable i in the hexadecimal representation
#include <stdio.h>
int main( void )
{
int i = 577;
printf( "i = %#x\n", i );
}
The program output will be
i = 0x241
So the least significant byte contains the hexadecimal value 0x41 that represents the ASCII code of the letter 'A'.
577 in hex is 0x241. The ASCII representation of 'A' is 0x41. You're passing an int to printf but then telling printf to treat it as a char (because of %c). A char is one-byte wide and so printf looks at the first argument you gave it and reads the least significant byte which is 0x41.
To print an integer, you need to use %d or %i.

Relationship between char and ASCII Code?

My computer science teacher taught us that which data type to declare depends on the size of the value for a variable you need. And then he demonstrated having a char add and subtract a number to output a different char. I remember he said this is something to do with ASCII Code. Can anyone explain this more specifically and clearly ? So, is char considerd as a number(since we can do math with it ) or a character or both? Can we print out the number behind a char?how?
So, is char considerd as a number or a character or both?
Both. It is an integer, but that integer value represents a character, as described by the character encoding of your system. The character encoding of the system that your computer science teacher uses happens to be ASCII.
Can we print out the number behind a char?how?
C++ (as the question used to be tagged):
The behaviour of the character output stream (such as std::cout) is to print the represented character when you insert an integer of type char. But the behaviour for all other integer types is to print the integer value. So, you can print the integer value of a char by converting it to another integer type:
std::cout << (unsigned)'c';
C:
There are no templated output streams, so you don't need to do explicit conversion to another integer (except for the signedness). What you need is the correct format specifier for printf:
printf("%hhu", (unsigned char)'c');
hh is for integer of size char, u is to for unsigned as you probably are interested in the unsigned representation.
A char can hold a number, it's the smallest integer type available on your machine and must have at least 8 bits. It is synonymous to a byte.
It's typical use is to store the codes of characters. Computers can only deal with numbers, so, to represent characters, numbers are used. Of course you must agree on which number means which character.
C doesn't require a specific character encoding, but most systems nowadays use a superset of ASCII (this is a very old encoding using only 7 bits) like e.g. UTF-8.
So, if you have a char that holds a character and you add or subtract some value, the result will be another number that happens to be the code for a different character.
In ASCII, the characters 0-9, a-z and A-Z have adjacent code points, therefore by adding e.g. 2 to A, the result will be C.
Can we print out the number behind a char?
Of course. It just depends whether you interpret the value in the char as just a number or as the code of a character. E.g. with printf:
printf("%c\n", 'A'); // prints the character
printf("%hhu\n", (unsigned char)'A'); // prints the number of the code
The cast to (unsigned char) is only needed because char is allowed to be either signed or unsigned, we want to treat it as unsigned here.
A char takes up a single byte. On systems with an 8 bit byte this gives it a range (assuming char is signed) of -128 to 127. You can print this value as follows:
char a = 65;
printf("a=%d\n", a);
Output:
65
The %d format specifier prints its argument as a decimal integer. If on the other hand you used the %c format specifier, this prints the character associated with the value. On systems that use ASCII, that means it prints the ASCII character associated with that number:
char a = 65;
printf("a=%c\n", a);
Output:
A
Here, the character A is printed because 65 is the ASCII code for A.
You can perform arithmetic on these numbers and print the character for the resulting code:
char a = 65;
printf("a=%c\n", a);
a = a + 1;
printf("a=%c\n", a);
Output:
A
B
In this example we first print A which is the ASCII character with code 65. We then add 1 giving us 66. Then we print the ASCII character for 66 which is B.
Every variable is stored in binary (i.e as a number,) chars, are just numbers of a specific size.
They represent a character when encoded using some character encoding, the ASCII standard (www.asciitable.com) is here.
As in the #Igor comment, if you run the following code; you see the ASCII character, Decimal and Hexadecimal representation of your char.
char c = 'A';
printf("%c %d 0x%x", c, c, c);
Output:
A 65 0x41
As an exercise to understand it better, you could make a program to generate the ASCII Table yourself.
My computer science teacher taught us that which data type to declare depends on the size of the value for a variable you need.
This is correct. Different types can represent different ranges of values. For reference, here are the various integral types and the minimum ranges they must be able to represent:
Type Minimum Range
---- -------------
signed char -127...127
unsigned char 0...255
char same as signed or unsigned char, depending on implementation
short -32767...32767
unsigned short 0...65535
int -32767...32767
unsigned int 0...65535
long -2147483647...2147483647
unsigned long 0...4294967295
long long -9223372036854775807...9223372036854775807
unsigned long long 0...18446744073709551615
An implementation may represent a larger range in a given type; for example, on most modern implementations, the range of an int is the same as the range of a long.
C doesn't mandate a fixed size (bit width) for the basic integral types (although unsigned types are the same size as their signed equivalent); at the time C was first developed, byte and word sizes could vary between architectures, so it was easier to specify a minimum range of values that the type had to represent and leave it to the implementor to figure out how to map that onto the hardware.
C99 introduced the stdint.h header, which defines fixed-width types like int8_t (8-bit), int32_t (32-bit), etc., so you can define objects with specific sizes if necessary.
So, is char considerd as a number(since we can do math with it ) or a character or both?
char is an integral data type that can represent values in at least the range [0...127]1, which is the range of encodings for the basic execution character set (upper- and lowercase Latin alphabet, decimal digits 0 through 9, and common punctuation characters). It can be used for storing and doing regular arithmetic on small integer values, but that's not the typical use case.
You can print char objects out as a characters or numeric values:
#include <limits.h> // for CHAR_MAX
...
printf( "%5s%5s\n", "dec", "char" );
printf( "%5s%5s\n", "---", "----" );
for ( char i = 0; i < CHAR_MAX; i++ )
{
printf("%5hhd%5c\n", i, isprint(i) ? i : '.' );
}
That code will print out the integral value and the associated character, like so (this is ASCII, which is what my system uses):
...
65 A
66 B
67 C
68 D
69 E
70 F
71 G
72 H
73 I
...
Control characters like SOH and EOT don't have an associated printing character, so for those value the code above just prints out a '.'.
By definition, a char object takes up a single storage unit (byte); the number of bits in a single storage unit must be at least 8, but could be more.
Plain char may be either signed or unsigned depending on the implementation so it can represent additional values outside that range, but it must be able to represent *at least* those values.

Since characters from -128 to -1 are same as from +128 to +255, then what is the point of using unsigned char?

#include <stdio.h>
#include <conio.h>
int main()
{
char a=-128;
while(a<=-1)
{
printf("%c\n",a);
a++;
}
getch();
return 0;
}
The output of the above code is same as the output of the code below
#include <stdio.h>
#include <conio.h>
int main()
{
unsigned char a=+128;
while(a<=+254)
{
printf("%c\n",a);
a++;
}
getch();
return 0;
}
Then why we use unsigned char and signed char?
K & R, chapter and verse, p. 43 and 44:
There is one subtle point about the conversion of characters to
integers. The language does not specify whether variables of type char
are signed or unsigned quantities. When a char is converted to an int,
can it ever produce a negative integer? The answer varies from machine
to machine, reflecting differences in architecture. On some machines,
a char whose leftmost bit is 1 will be converted to a negative integer
("sign extension"). On others, a char is promoted to an int by adding
zeros at the left end, and thus is always positive. [...] Arbitrary
bit patterns stored in character variables may appear to be negative
on some machines, yet positive on others. For portability, specify
signed or unsigned if non-character data is to be stored in char
variables.
With printing characters - no difference:
The function printf() uses "%c" and takes the int argument and converts it to unsigned char and then prints it.
char a;
printf("%c\n",a); // a is converted to int, then passed to printf()
unsigned char ua;
printf("%c\n",ua); // ua is converted to int, then passed to printf()
With printing values (numbers) - difference when system uses a char that is signed:
char a = -1;
printf("%d\n",a); // --> -1
unsigned char ua = -1;
printf("%d\n",ua); // --> 255 (Assume 8-bit unsigned char)
Note: Rare machines will have int the same size as char and other concerns apply.
So if code uses a as a number rather than a character, the printing differences are significant.
The bit representation of a number is what the computer stores, but it doesn't mean anything without someone (or something) imposing a pattern onto it.
The difference between the unsigned char and signed char patterns is how we interpret the set bits. In one case we decide that zero is the smallest number and we can add bits until we get to 0xFF or binary 11111111. In the other case we decide that 0x80 is the smallest number and we can add bits until we get to 0x7F.
The reason we have the funny way of representing signed numbers (the latter pattern) is because it places zero 0x00 roughly in the middle of the sequence, and because 0xFF (which is -1, right before zero) plus 0x01 (which is 1, right after zero) add together to carry until all the bits carry off the high end leaving 0x00 (-1 + 1 = 0). Likewise -5 + 5 = 0 by the same mechanisim.
For fun, there are a lot of bit patterns that mean different things. For example 0x2a might be what we call a "number" or it might be a * character. It depends on the context we choose to impose on the bit patterns.
Because unsigned char is used for one byte integer in C89.
Note there are three distinct char related types in C89: char, signed char, unsigned char.
For character type, char is used.
unsigned char and signed char are used for one byte integers like short is used for two byte integers. You should not really use signed char or unsigned char for characters. Neither should you rely on the order of those values.
Different types are created to tell the compiler how to "understand" the bit representation of one or more bytes. For example, say I have a byte which contains 0xFF. If it's interpreted as a signed char, it's -1; if it's interpreted as a unsigned char, it's 255.
In your case, a, no matter whether signed or unsigned, is integral promoted to int, and passed to printf(), which later implicitly convert it to unsigned char before printing it out as a character.
But let's consider another case:
#include <stdio.h>
#include <string.h>
int main(void)
{
char a = -1;
unsigned char b;
memmove(&b, &a, 1);
printf("%d %u", a, b);
}
It's practically acceptable to simply write printf("%d %u", a, a);. memmove() is used just to avoid undefined behaviour.
It's output on my machine is:
-1 4294967295
Also, think about this ridiculous question:
Suppose sizeof (int) == 4, since arrays of characters (unsigned
char[]){UCHAR_MIN, UCHAR_MIN, UCHAR_MIN, UCHAR_MIN} to (unsigned
char[]){UCHAR_MAX, UCHAR_MAX, UCHAR_MAX, UCHAR_MAX} are same as
unsigned ints from UINT_MIN to UINT_MAX, then what is the point
of using unsigned int?

Is it safe to cast a character type to an integer type

int main() {
char ch = 'a';
int x;
x = ch;
printf("x=%c", x);
}
Is this code safe to use (considering endiness of machine)?
Yes, it is safe to cast a character (like char) type to an integer type (like int).
In this answer and others, endian-ness is not a factor.
There are 4 conversions going on here and no casting:
a is character of the C encoding. 'a' converts to an int at compile time.
'a'
The int is converted to a char.
char ch = 'a';
The char ch is converted to an int x. In theory there could be a loss of data going from char to int **, but given the overwhelming implementations, there is none. Typical examples: If char is signed in the range -128 to 127, this maps well into int. If char is unsigned in the range 0 to 255, this also maps well into int.
int x;
x = ch;
printf("%c", x) uses the int x value passed to it, converts it to unsigned char and then prints that character. (C11dr §7.21.6.1 8 #haccks) Note there is no conversion of x due to the usual conversion of variadic parameters as x is all ready an int.
printf("x=%c", x);
** char and int could be the same size and char is unsigned with a positive range more than int. This is the one potential problem with casting char to int although typically there is not loss of data. This could be further complicated should char have range like 0 to 2³²-1 and int with a range of -(2³¹-1) to +(2³¹-1). I know of no such machine.
Yes, casting integer types to bigger integer types is always safe.
Standard library's *getc (fgetc, getchar, ...) functions do just that--they read unsigned chars internally and cast them to int because int provides additional room for encoding EOF (end of file, usually EOF==-1).
Yes it is, because int is bigger than char, but using char instead of int would not be safe for the same reason.
What you are doing is first =
int x = ch => Assigning the ascii value of the char to an int
And finally :
printf("x=%c", x); => Printing the ascii value as a char, which will print the actual char that correspond to that value. So yeah it's safe to do that, it's a totally predicatable behaviour.
But safe does not mean useful as integer is bigger than char, usually we do the inverse to save some memory.
it is safe here because char is converted to int anyway when calling printf.
see C++ variadic arguments

are int and char represented using the same bits internally by gcc?

I was playing around with unicode characters (without using wchar_t support) just for fun. I'm only using the regular char data type. I noticed that while printing them in hex they were showing up full 4 bytes instead of just one byte.
For ex. consider this c file:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *s = (char *) malloc(100);
fgets(s, 100, stdin);
while (s && *s != '\0') {
printf("%x\n", *s);
s++;
}
return 0;
}
After compiling with gcc and giving input as 'cent' symbol (hex: c2 a2) I get the following output
$ ./a.out
¢
ffffffc2: ?
ffffffa2: ?
a:
So instead of just printing c2 and a2 I got the whole 4 bytes as if it's an int type.
Does this mean char is not really 1-byte in length, ascii made it look like 1-byte?
Maybe the reason why the upper three bytes become 0xFFFFFF needs a bit more explanation?
The upper three bytes of the value printed for *s have a value of 0xFF due to sign extension.
The char value passed to printf is extended to an int before the call to printf.
This is due to C's default behaviour.
In the absence of signed or unsigned, the compiler can default to interpret char as signed char or unsigned char. It is consistently one or the other unless explicitly changed with a command line option or pragma's. In this case we can see that it is signed char.
In the absence of more information (prototypes or casts), C passes:
int, so char, short, unsigned char unsigned short are converted to int. It never passes a char, unsigned char, signed char, as a single byte, it always passes an int.
unsigned int is the same size as int so the value is passed without change
The compiler needs to decide how to convert the smaller value to an int.
signed values: the upper bytes of the int are sign extended from the smaller value, which effectively copies the top, sign bit, upwards to fill the int. If the top bit of the smaller signed value is 0, the upper bytes are filled with 0. If the top bit of the smaller signed value is 1, the upper bytes are filled with 1. Hence printf("%x ",*s) prints ffffffc2
unsigned values are not sign extended, the upper bytes of the int are 'zero padded'
Hence the reason C can call a function without a prototype (though the compiler will usually warn about that)
So you can write, and expect this to run (though I would hope your compiler issues warnings):
/* Notice the include is 'removed' so the C compiler does default behaviour */
/* #include <stdio.h> */
int main (int argc, const char * argv[]) {
signed char schar[] = "\x70\x80";
unsigned char uchar[] = "\x70\x80";
printf("schar[0]=%x schar[1]=%x uchar[0]=%x uchar[1]=%x\n",
schar[0], schar[1], uchar[0], uchar[1]);
return 0;
}
That prints:
schar[0]=70 schar[1]=ffffff80 uchar[0]=70 uchar[1]=80
The char value is interpreted by my (Mac's gcc) compiler as signed char, so the compiler generates code to sign extended the char to the int before the printf call.
Where the signed char value has its top (sign) bit set (\x80), the conversion to int sign extends the char value. The sign extension fills in the upper bytes (in this case 3 more bytes to make a 4 byte int) with 1's, which get printed by printf as ffffff80
Where the signed char value has its top (sign) bit clear (\x70), the conversion to int still sign extends the char value. In this case the sign is 0, so the sign extension fills in the upper bytes with 0's, which get printed by printf as 70
My example shows the case where the value is unsigned char. In these two cases the value is not sign extended because the value is unsigned. Instead they are extended to int with 0 padding. It might look like printf is only printing one byte because the adjacent three bytes of the value would be 0. But it is printing the entire int, it happens that the value is 0x00000070 and 0x00000080 because the unsigned char values were converted to
int without sign extension.
You can force printf to only print the low byte of the int, by using suitable formatting (%hhx), so this correctly prints only the value in the original char:
/* Notice the include is 'removed' so the C compiler does default behaviour */
/* #include <stdio.h> */
int main (int argc, const char * argv[]) {
char schar[] = "\x70\x80";
unsigned char uchar[] = "\x70\x80";
printf("schar[0]=%hhx schar[1]=%hhx uchar[0]=%hhx uchar[1]=%hhx\n",
schar[0], schar[1], uchar[0], uchar[1]);
return 0;
}
This prints:
schar[0]=70 schar[1]=80 uchar[0]=70 uchar[1]=80
because printf interprets the %hhx to treat the int as an unsigned char. This does not change the fact that the char was sign extended to an int before printf was called. It is only a way to tell printf how to interpret the contents of the int.
In a way, for signed char *schar, the meaning of %hhx looks slightly misleading, but the '%x' format interprets int as unsigned anyway, and (with my printf) there is no format to print hex for signed values (IMHO it would be a confusing).
Sadly, ISO/ANSI/... don't freely publish our programming language standards, so I can't point to the specification, but searching the web might turn up working drafts. I haven't tried to find them. I would recommend "C: A Reference Manual" by Samuel P. Harbison and Guy L. Steele as a cheaper alternative to the ISO document.
HTH
No. printf is a variable argument function, arguments to a variable argument function will be promoted to an int. And in this case the char was negative, so it gets sign extended.
%x tells printf that the value to print is an unsigned int. So, it promotes the char to an unsigned int, sign extending as necessary and then prints out the resulting value.

Resources