The standard C library function atoi is documented in ISO 9899:2011 as:
7.22.1 Numeric conversion functions
1 The functions atof, atoi, atol, and atoll need not affect the value of the integer expression errno on an error. If the value of the result cannot be represented, the behavior is undefined.
...
7.22.1.2 The atoi, atol, and atoll functions
Synopsis
#include <stdlib.h>
int atoi(const char *nptr);
long int atol(const char *nptr);
long long int atoll(const char *nptr);
Description
2 The atoi, atol, and atoll functions convert the initial portion of the string pointed to by nptr to int, long int, and long long int representation, respectively. Except for the behavior on error, they are equivalent to
atoi: (int)strtol(nptr, (char **)NULL, 10)
atol: strtol(nptr, (char **)NULL, 10)
atoll: strtoll(nptr, (char **)NULL, 10)
Returns
3 The atoi, atol, and atoll functions return the converted value.
What is the intended behaviour when string pointed to by nptr cannot be parsed as an integer? The following four opinions seem to exist:
No conversion is performed and zero is returned. This is the documentation given by some references like this one.
Behaviour is like that of strtol except that errno might not be set. This emerges from taking “Except for the behavior on error” as a reference to §7.22.1 ¶1.
Behaviour is unspecified. This is what POSIX says:
The call atoi(str) shall be equivalent to:
(int) strtol(str, (char **)NULL, 10)
except that the handling of errors may differ. If the value cannot be represented, the behavior is undefined.
Furthermore, the section Application Usage states:
The atoi() function is subsumed by strtol() but is retained because it is used extensively in existing code. If the number is not known to be in range, strtol() should be used because atoi() is not required to perform any error checking.
Note that POSIX claims that the specification is aligned to ISO 9899:1999 (which contains the same language as ISO 9899:2011 as far as I'm concerned):
The functionality described on this reference page is aligned with the ISO C standard. Any conflict between the requirements described here and the ISO C standard is unintentional. This volume of POSIX.1-2008 defers to the ISO C standard.
According to my local POSIX committee member, this is the historical behaviour of UNIX.
Behaviour is undefined. This interpretation arises because §7.22.1.2 ¶2 never explicitly says what happens on error. Behaviour that is neither defined nor explicitly implementation defined or unspecified is undefined.
Which of these interpretations is correct? Please try to refer to authoritative documentation.
What is the intended behaviour when string pointed to by nptr cannot be parsed as an integer?
To be clear, this question applies to
// Case 1
value = atoi("");
value = atoi(" ");
value = atoi("wxyz");
and not the following:
// Case 2
// NULL does not point to a string
value = atoi(NULL);
// Convert the initial portion, yet has following junk
value = atoi("123xyz");
value = atoi("123 ");
And maybe/maybe not the following depending on usage of integer.
// Case 3
// Can be parsed as an _integer_, yet overflows an `int`.
value = atoi("12345678901234567890123456789012345678901234567890");
The "non-Case 2" behavior of ato*() depends on the meaning of error in
The atoi, atol, and atoll functions convert the initial portion of the string pointed to by nptr to int, long int, and long long int representation, respectively. Except for the behavior on error, they are equivalent to
atoi: (int)strtol(nptr, (char **)NULL, 10)
...
C11dr §7.22.1.2 2
Certainly error includes case 3: "If the correct value is outside the range of representable values". strto*(), though maybe not ato*(), in this case does set the error number errrno defined in <errno.h>. Since the specification of ato*() does not apply to this error, overflow, the result, is UB per
Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior. C11dr §4 2
For case 1, the behavior of strto*() is well defined and is not specified to affect errno. The spec goes into detail (§7.22.1.4 4) and calls these "no conversion", not an error. So it can asserted the case 1 strto*() behavior is not an error, but a "no conversion". Thus per ...
"If no conversion could be performed, zero is returned. C11dr §7.22.1.4 8
... atoi("") must return 0.
Related
Can someone explain the output of the 2nd line?
int x=10;
printf("%d\n",x);
printf("%p\n",x);
printf("%p\n",&x);
Output:
10
0000000a
006aff0c
In your case, it appears to be treating the value 10 as a pointer and outputting it as an eigth-digit, zero-padded-on-the-left, hexadecimal number(a).
However, as per C11 7.21.6.1 The fprintf function, /9 (all standards quotes below also refer to C11):
If a conversion specification is invalid, the behavior is undefined. If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.
So, literally, anything is permitted to happen. You generally want to avoid undefined behaviour as much as possible, as the behaviour can change between implementations, versions or even the day of the week :-)
There's also technically a problem with your third line. The standard states, in 7.21.6.1 /8 for the p conversion specifier:
The argument shall be a pointer to void. The value of the pointer is
converted to a sequence of printing characters, in an implementation-defined
manner.
Since &x is actually a pointer to int, this puts it in violation of the same contract. You should probably use something like this instead:
printf ("%p\n", (void*)(&x));
(a) What it actually does is implementation-defined, as per the second sentence in the final quote above. It can basically do anything, provided the implementation documents it, as specified in J.3 Implementation-defined behavior:
A conforming implementation is required to document its choice of behavior in each of the areas listed in this subclause.
This question already has answers here:
Can sizeof(int) ever be 1 on a hosted implementation?
(8 answers)
Closed 7 years ago.
According to the C standard, any characters returned by fgetc are returned in the form of unsigned char values, "converted to an int" (that quote comes from the C standard, stating that there is indeed a conversion).
When sizeof (int) == 1, many unsigned char values are outside of range. It is thus possible that some of those unsigned char values might end up being converted to an int value (the result of the conversion being "implementation-defined or an implementation-defined signal is raised") of EOF, which would be returned despite the file not actually being in an erroneous or end-of-file state.
I was surprised to find that such an implementation actually exists. The TMS320C55x CCS manual documents UCHAR_MAX having a corresponding value of 65535, INT_MAX having 32767, fputs and fopen supporting binary mode... What's even more surprising is that it seems to describe the environment as a fully conforming, complete implementation (minus signals).
The C55x C/C++ compiler fully conforms to the ISO C standard as defined by the ISO specification ...
The compiler tools come with a complete runtime library. All library
functions conform to the ISO C library standard. ...
Is such an implementation that can return a value indicating errors where there are none, really fully conforming? Could this justify using feof and ferror in the condition section of a loop (as hideous as that seems)? For example, while ((c = fgetc(stdin)) != EOF || !(feof(stdin) || ferror(stdin))) { ... }
The function fgetc() returns an int value in the range of unsigned char only when a proper character is read, otherwise it returns EOF which is a negative value of type int.
My original answer (I changed it) assumed that there was an integer conversion to int, but this is not the case, since actually the function fgetc() is already returning a value of type int.
I think that, to be conforming, the implementation have to make fgetc() to return nonnegative values in the range of int, unless EOF is returned.
In this way, the range of values from 32768 to 65535 will be never associated to character codes in the TMS320C55x implementation.
In C, signed integers like -1 are supposedly supposed to be declared with the keyword signed, like so:
signed int i = -1;
However, I tried this:
signed int i = -2;
unsigned int i = -2;
int i = -2;
and all 3 cases print out -2 with printf("%d", i);. Why?
Since you confirmed you are printing using:
printf("%d", i);
this is undefined behavior in the unsigned case. This is covered in the draft C99 standard section 7.19.6.1 The fprintf function which also covers printf for format specifiers, it says in paragraph 9:
If a conversion specification is invalid, the behavior is undefined.248)[...]
The standard defined in section 3.4.3 undefined behavior as:
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
and further notes:
Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
Finally, we can see that int is the same as signed int. We can see this by going to section 6.7.2 Type specifiers, in paragraph 2 it groups int as follows:
int, signed, or signed int
and later on says in paragraph 5 says:
Each of the comma-separated sets designates the same type, except that for bit-field[...]
The way an integer variable is printed, is subjected to the format string that you pass to printf:
If you use %d, then you'll be printing it as a signed integer.
If you use %u, then you'll be printing it as an unsigned integer.
printf has no way of knowing what you pass to it. C compiler does the default type promotions on passing the arguments, and then the function itself reinterprets the values in accordance with the format specifiers that you pass, because it has no other information regarding the type of the value that you passed.
When you pass an unsigned int to printf in a position of %d, it is undefined behavior. Your program is incorrect, and it could print anything.
It happens that on hardware that represent negative numbers in two's complement representation you get the same number that you started with. However, this is not a universal rule.
unsigned int i = -2; // i actually holds 4294967294
printf("%d", i); // printf casts i back to an int which is -2 hence the same output
You've got 2 things going on:
Signed and unsigned are different ways of interpreting the same 64 (or 32, or whatever) bits.
Printf is a variadic function which accepts parameters of different types
You passed a signed value (-2) to an unsigned variable, and then asked printf to interpret it as signed.
Remember that "signed" and unsigned have to do with how arithmetic is done on the numbers.
the printf family accepts internally casts whatever you pass in based on the format designators. (this is the nature of variadic functions that accept more than one type of parameter. They cannot use traditional type safety mechanisms)
This is all very well, but not all things will work the same.
Addition and subtraction work the same on most architectures (as long as you're not on some oddball architecture that doesn't use 2's complement for representing negative numbers
Multiplication and division may also work the same.
Inequality comparisons are the hardest thing to know how they will work, and I have been bit a number of times doing a comparison between signed and unsigned that I thought would be ok, be cause they were in the small signed number range.
Thats what "undefined" means. Behaviour is left to the compiler and hardware implementers and cannot be relied to be the same between architectures or even over time on the same architecture.
I have a C code in which I am using standard library function isalpha() in ctype.h, This is on Visual Studio 2010-Windows.
In below code, if char c is '£', the isalpha call returns an assertion as shown in the snapshot below:
char c='£';
if(isalpha(c))
{
printf ("character %c is alphabetic\n",c);
}
else
{
printf ("character %c is NOT alphabetic\n",c);
}
I can see that this might be because 8 bit ASCII does not have this character.
So how do I handle such Non-ASCII characters outside of ASCII table?
What I want to do is if any non-alphabetic character is found(even if it includes such character not in 8-bit ASCII table) i want to be able to neglect it.
You may want to cast the value sent to isalpha (and the other functions declared in <ctype.h>) to unsigned char
isalpha((unsigned char)value)
It's one of the (not so) few occasions where a cast is appropriate in C.
Edited to add an explanation.
According to the standard, emphasis is mine
7.4
1 The header <ctype.h> declares several functions useful for classifying and mapping
characters. In all cases the argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value of the macro EOF. If the
argument has any other value, the behavior is undefined.
The cast to unsigned char ensures calling isalpha() does not invoke Undefined Behaviour.
You must pass an int to isalpha(), not a char. Note the standard prototype for this function:
int isalpha(int c);
Passing an 8-bit signed character will cause the value to be converted into a negative integer, resulting in an illegal negative offset into the internal arrays typically used by isxxxx().
However you must ensure that your char is treated as unsigned when casting - you can't simply cast it directly to an int, because if it's an 8-bit character the resulting int would still be negative.
The typical way to ensure this works is to cast it to an unsigned char, and then rely on implicit type conversion to convert that into an int.
e.g.
char c = '£';
int a = isalpha((unsigned char) c);
You may be compiling using wchar (UNICODE) as character type, in that case the isalpha method to use is iswalpha
http://msdn.microsoft.com/en-us/library/xt82b8z8.aspx
According to section 4.9.6.1 of the C89 draft, %d is a character that specifies the type of conversion to be applied.
The word conversion implies, in my opinion, that printf("%d", 1.0) is defined.
Please confirm or refute this.
The conversion is the conversion of a language value to a lexical representation of that value.
Your theory is wrong; behavior is undefined. The spec says (7.19.6.1p8 and 9, using C99 TC2):
The int argument is converted to signed decimal in the style [−]dddd.
And
If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.
Printf is a varargs function, so no conversion is possible. The compiler just arranges to push a double onto the arguments list. Printf has no way to find out that it's a double versus an int versus an elephant. Result? Chaos.
The word "conversion" here is referring to the conversion of an int (which is the only acceptable argument type here) to a string of characters that make of the decimal representation of that int. It has nothing to do with conversion from other types (such as double) to int.
Not sure if it's officially undefined or an error - but it's wrong!