why does this lead to core dump? - c

#include <ctype.h>
#include <stdio.h>
int atoi(char *s);
int main()
{
printf("%d\n", atoi("123"));
}
int atoi(char *s)
{
int i;
while (isspace(*s))
s++;
int sign = (*s == '-') ? -1 : 1;
/* same mistake for passing pointer to isdigit, but will not cause CORE DUMP */
// isdigit(s), s++;// this will not lead to core dump
// return -1;
/* */
/* I know s is a pointer, but I don't quite understand why code above will not and code below will */
if (!isdigit(s))
s++;
return -1;
/* code here will cause CORE DUMP instead of an comile-time error */
for (i = 0; *s && isdigit(s); s++)
i = i * 10 + (*s - '0');
return i * sign;
}
I got "Segmentation fault (core dumped)" when I accidentally made mistake about missing * operator before 's'
then I got this confusing error.
Why "(!isdigit(s))" lead to core dump while "isdigit(s), s++;" will not.

From isdigit [emphasis added]
The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF.
From isdigit [emphasis added]
The c argument is an int, the value of which the application shall ensure is a character representable as an unsigned char or equal to the value of the macro EOF. If the argument has any other value, the behavior is undefined.
https://godbolt.org/z/PEnc8cW6T
An undefined behaviour includes it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.

All answers so far has failed to point out the actual problem, which is that implicit pointer to integer conversions are not allowed during assignment in C. Details here: "Pointer from integer/integer from pointer without a cast" issues
Specifically C17 6.5.2.2/7
If the expression that denotes the called function has a type that does include a prototype,
the arguments are implicitly converted, as if by assignment, to the types of the
corresponding parameters
Where "as if by assignment" sends us to check the rules of assignment 6.5.16.1, which are quoted in the above link. So isdigit(s) is equivalent to something like this:
char* s;
...
int param_to_isdigit = s; // constraint violation of 6.5.16.1
Here the compiler must issue a diagnostic message. If you didn't spot it or in case you are using a tool chain giving warnings instead of errors, check out What compiler options are recommended for beginners learning C? so that you prevent code like this from compiling, so that you don't have to spend time troubleshooting bugs that the compiler already spotted for you.
Furthermore, the ctype.h functions require that the passed integer must be representable as unsigned char, but that's another story. C17 7.4 Character handling <ctype.h>:
In all cases the argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value of the macro EOF

You are invoking undefined behavior. isdigit() is supposed to receive an int argument, but you pass in a pointer. This is effectively attempting to assign a pointer to an int (xref: Language / Expressions / Assignment operators / Simple assignment, ¶1).
Furthermore, there is a constraint that the argument to isdigit() be representable as an unsigned char or equal to EOF. (xref: Library / Character handling <ctype.h>, ¶1).
As a guess, the isdigit() function may be performing some kind of table lookup, and the input value may cause the function to access a pointer value beyond the table.

Why no segfault from isdigit(s), s++;?
First of all. Undefined behavior can manifest itself in a lot of ways, including the program working as intended. That's what undefined means.
But that line is not equivalent to your if statement. What this does is that it executes isdigit(s), throws away the result, increments s and also throw away the result of that operation.
However, isdigit does not have side effects, so it's quite probable that the compiler simply removes the call to that function, and replace this line with an unconditional s++. That would explain why it does not segfault. But you would have to study the generated assembly to make sure, but it's a possibility.
You can read about the comma operator here What does the comma operator , do?

I wasn't able to repeat this behaviour in MacOS/Darwin, but I was able to in Debian Linux.
To investigate a bit further, I wrote the following program:
#include <ctype.h>
#include <stdio.h>
int main()
{
printf("isalnum('a'): %d\n", isalnum('a'));
printf("isalpha('a'): %d\n", isalpha('a'));
printf("iscntrl('\n'): %d\n", iscntrl('\n'));
printf("isdigit('1'): %d\n", isdigit('1'));
printf("isgraph('a'): %d\n", isgraph('a'));
printf("islower('a'): %d\n", islower('a'));
printf("isprint('a'): %d\n", isprint('a'));
printf("ispunct('.'): %d\n", ispunct('.'));
printf("isspace(' '): %d\n", isspace(' '));
printf("isupper('A'): %d\n", isupper('A'));
printf("isxdigit('a'): %d\n", isxdigit('a'));
printf("isdigit(0x7fffffff): %d\n", isdigit(0x7fffffff));
return 0;
}
In MacOS, this just prints out 1 for every result except the last one, implying that these functions are simply returning the result of a logical comparison.
The results are a bit different in Linux:
isalnum('a'): 8
isalpha('a'): 1024
iscntrl('\n'): 2
isdigit('1'): 2048
isgraph('a'): 32768
islower('a'): 512
isprint('a'): 16384
ispunct('.'): 4
isspace(' '): 8192
isupper('A'): 256
isxdigit('a'): 4096
Segmentation fault
This suggests to me that the library used in Linux is fetching values from a lookup table and masking them with a bit pattern corresponding to the argument provided. For example, '1' (ASCII 49) is an alphanumeric character, a digit, a printable character and a hex digit, so entry 49 in this lookup table is probably equal to 8+2018+32768+16384+4096, which is 55274.
The documentation for these functions does mention that the argument must have either the value of an unsigned char (0-255) or EOF (-1), so any value outside this range is causing this table to be read out of bounds, resulting in a segmentation error.
Since I'm only calling the isdigit() function with an integer argument, this can hardly be described as undefined behaviour. I really think the library functions should be hardened against this sort of problem.

Related

Clarification about precedence of operators

I got this snippet from some exercises and the question: which is the output of following code:
main()
{
char *p = "ayqm";
printf("%c", ++*(p++));
}
My expected answer was z but the actual answer was in fact b. How is that possible?
Later edit: the snippet is taken as it is from an exercise and did not focus on the string literal or syntax issues existent in other than the printf() code zone.
As posted, the program has multiple problems:
it tries to modify the string constant "ayqm", which described as undefined behavior in the C Standard.
it uses printf without a proper declaration, again producing undefined behavior.
its output is not terminated with a newline, causing implementation defined behavior.
the prototype for main without a return type is obsolete, no longer supported by the C Standard.
incrementing characters produces implementation defined behavior. If the execution character set is ASCII, 'a'+1 does produce 'b', but it is not guaranteed by the C Standard. Indeed in the EBCDIC character set still used in older mainframe computers letters are in a single monotonic sequence (ie: 'a'+1 == 'b' but 'i'+1 != 'j' in this character set).
Here is a corrected version:
#include <stdio.h>
int main(void) {
char str[] = "ayqm";
char *p = str;
printf("%c\n", ++*(p++));
return 0;
}
p is post-incremented, which means the current value of p is used for the * operator and the value of p is incremented before the next sequence point, namely the call to the printf function. The character read through p, 'a' is then incremented, which may or may not produce 'b' depending on the execution character set.
After printf returns to the main function, p points to str[1] and str contains the string "byqm".
Your program is having undefined behavior because it is trying to modify the string literal "ayqm". As per the standard attempting to modify a string literal results in undefined behavior because it may be stored in read-only storage.
The pointer p is pointing to string literal "ayqm". This expression
printf ("%c", ++*(p++));
end up attempting to modify the string literal that pointer p is pointing to.
An undefined behavior in a program includes it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.

Calling isalpha Causing Segmentation Fault

I have the following program that causes a segmentation fault.
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(int argc, char *argv[])
{
printf("TEST");
for (int k=0; k<(strlen(argv[1])); k++)
{
if (!isalpha(argv[1])) {
printf("Enter only alphabets!");
return 1;
}
}
return 0;
}
I've figured out that it is this line that is causing the problem
if (!isalpha(argv[1])) {
and replacing argv[1] with argv[1][k] solves the problem.
However, I find it rather curious that the program results in a segmentation fault without even printing TEST. I also expect the isalpha function to incorrectly check if the lower byte of the char* pointer to argv[1], but this doesn't seem to be the case. I have code to check for the number of arguments but isn't shown here for brevity.
What's happening here?
In general it is rather pointless to discuss why undefined behaviour leads to this result or the other.
But maybe it doesn't harm to try to understand why something happens even if it is outside the spec.
There are implementation of isalpha which use a simple array to lookup all possible unsigned char values. In that case the value passed as parameter is used as index into the array.
While a real character is limited to 8 bits, an integer is not.
The function takes an int as parameter. This is to allow entering EOF as well which does not fit into unsigned char.
If you pass an address like 0x7239482342 into your function this is far beyond the end of the said array and when the CPU tries to read the entry with that index it falls off the rim of the world. ;)
Calling isalpha with such an address is the place where the compiler should raise some warning about converting a pointer to an integer. Which you probably ignore...
The library might contain code that checks for valid parameters but it might also just rely on the user not passing things that shall not be passed.
printf was not flushed
the implicit conversion from pointer to integer that ought to have generated at least compile-time diagnostics for constraint violation produced a number that was out of range for isalpha. isalpha being implemented as a look-up table means that your code accessed the table out of bounds, therefore undefined behaviour.
Why you didn't get diagnostics might be in one part because of how isalpha is implemented as a macro. On my computer with Glibc 2.27-3ubuntu1, isalpha is defined as
# define isalpha(c) __isctype((c), _ISalpha)
# define __isctype(c, type) \
((*__ctype_b_loc ())[(int) (c)] & (unsigned short int) type)
the macro contains an unfortunate cast to int in it, which will silence your error!
One reason why I am posting this answer after so many others is that you didn't fix the code, it still suffers from undefined behaviour given extended characters and char being signed (which happens to be generally the case on x86-32 and x86-64).
The correct argument to give to isalpha is (unsigned char)argv[1][k]! C11 7.4:
In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.
I find it rather curious that the program results in a segmentation fault without even printing TEST
printf doesn't print instantly, but it writes to a temporal buffer. End your string with \n if you want to flush it to actual output.
and replacing argv[1] with argv[1][k] solves the problem.
isalpha is intended to work with single characters.
First of all, a conforming compiler must give you a diagnostic message here. It is not allowed to implicitly convert from a pointer to the int parameter that isalpha expects. (It is a violation of the rules of simple assignment, 6.5.16.1.)
As for why "TEST" isn't printed, it could simply be because stdout isn't flushed. You could try adding fflush(stdout); after printf and see if this solves the issue. Alternatively add a line feed \n at the end of the string.
Otherwise, the compiler is free to re-order the execution of code as long as there are no side effects. That is, it is allowed to execute the whole loop before the printf("TEST");, as long as it prints TEST before it potentially prints "Enter only alphabets!". Such optimizations are probably not likely to happen here, but in other situations they can occur.

Why compiler is not issuing error while passing an int (not int *) as the argument of scanf()?

I tried the below c program & I expected to get compile time error, but why compiler isn't giving any error?
#include <stdio.h>
#include <conio.h>
int main()
{
int a,b;
printf("Enter a : ");
scanf("%d",&a);
printf("Enter b : ");
scanf("%d",b);
printf("a is %d and b is %d\n",a,b);
getch();
return 0;
}
I am not writing & in scanf("%d",b). At the compile time , compiler don't give any error but during execution value of b is 2686792(garbage value).
As per C11 standard, Chapter §6.3.2.3
An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.
So, the compiler will allow this, but the result is implementation-defined.
In this particular case, you're passing b to scanf() as an argument, which is uninitialized, which will cause the program to invoke undefined behaviour, once executed.
Also, printf() / scanf() being variadic functions, no check on paramater type is perforfmed in general, unless asked explicitly through compiler flag [See -Wformat].
It is not a a compile time error but a runtime issue.
The compiler expects that you have given a valid address to scan the value to, during runtime only it will come to know whether the address is valid or not .
If you try to scan the value to invalid address it leads to undefined behavior and might see a crash.
It scans fine because scanf expects a memory location in its argument. That's why we use & to give memory location of the corresponding variable.
In your case scanf just scans the entered value and puts it in the memory location which has the value of b(instead of scanning and putting it in the memory location where b is stored).
The & operator in C returns the address of the operand. If you give just the b without the & operator, it will be passed by value and the scanf won't be able to set the value. This would result in unexpected behaviour as you already noticed. It has no way of verifying the address before the runtime and hence you won't notice the issue until runtime.

Seg Fault with isdigit() in C?

I have this code:
#include <ctype.h>
char *tokenHolder[2500];
for(i = 0; tokenHolder[i] != NULL; ++i){
if(isdigit(tokenHolder[i])){ printf("worked"); }
Where tokenHolder holds the input of char tokens from user input which have been tokenized through getline and strtok. I get a seg fault when trying to use isdigit on tokenHolder — and I'm not sure why.
Since tokenHolder is an array of char *, when you index tokenHolder[i], you are passing a char * to isdigit(), and isdigit() does not accept pointers.
You are probably missing a second loop, or you need:
if (isdigit(tokenHolder[i][0]))
printf("working\n");
Don't forget the newline.
Your test in the loop is odd too; you normally spell 'null pointer' as 0 or NULL and not as '\0'; that just misleads people.
Also, you need to pay attention to the compiler warnings you are getting! Don't post code that compiles with warnings, or (at the least) specify what the warnings are so people can see what the compiler is telling you. You should be aiming for zero warnings with the compiler set to fussy.
If you are trying to test that the values in the token array are all numbers, then you need a test_integer() function that tries to convert the string to a number and lets you know if the conversion does not use all the data in the string (or you might allow leading and trailing blanks). Your problem specification isn't clear on exactly what you are trying to do with the string tokens that you've found with strtok() etc.
As to why you are getting the core dump:
The code for the isdigit() macro is often roughly
#define isdigit(x) (_Ctype[(x)+1]&_DIGIT)
When you provide a pointer, it is treated as a very large (positive or possibly negative) offset to an array of (usually) 257 values, and because you're accessing memory out of bounds, you get a segmentation fault. The +1 allows EOF to be passed to isdigit() when EOF is -1, which is the usual value but is not mandatory. The macros/functions like isdigit() take either an character as an unsigned char — usually in the range 0..255, therefore — or EOF as the valid inputs.
You're declaring an array of pointer to char, not a simple array of just char. You also need to initialise the array or assign it some value later. If you read the value of a member of the array that has not been initialised or assigned to, you are invoking undefined behaviour.
char tokenHolder[2500] = {0};
for(int i = 0; tokenHolder[i] != '\0'; ++i){
if(isdigit(tokenHolder[i])){ printf("worked"); }
On a side note, you are probably overlooking compiler warnings telling you that your code might not be correct. isdigit expects an int, and a char * is not compatible with int, so your compiler should have generated a warning for that.
You need/want to cast your input to unsigned char before passing it to isdigit.
if(isdigit((unsigned char)tokenHolder[i])){ printf("worked"); }
In most typical encoding schemes, characters outside the USASCII range (e.g., any letters with umlauts, accents, graves, etc.) will show up as negative numbers in the typical case that char is a signed.
As to how this causes a segment fault: isdigit (along with islower, isupper, etc.) is often implemented using a table of bit-fields, and when you call the function the value you pass is used as an index into the table. A negative number ends up trying to index (well) outside the table.
Though I didn't initially notice it, you also have a problem because tokenHolder (probably) isn't the type you expected/planned to use. From the looks of the rest of the code, you really want to define it as:
char tokenHolder[2500];

array subscript has type 'char'

I have the following code to read an argument from the command line. If the string is 1 character long and a digit I want to use that as the exit value. The compiler gives me a warning on the second line (array subscript has type 'char' ) This error comes from the second part after the "&&" .
if (args[1] != NULL) {
if ((strlen(args[1]) == 1) && isdigit(*args[1]))
exit(((int) args[1][0]));
else
exit(0);
}
}
Also, when I use a different compiler I get two errors on the next line (exit).
builtin.c: In function 'builtin_command':
builtin.c:55: warning: implicit declaration of function 'exit'
builtin.c:55: warning: incompatible implicit declaration of built-in function 'exit'
The trouble is that the isdigit() macro takes an argument which is an integer that is either the value EOF or the value of an unsigned char.
ISO/IEC 9899:1999 (C Standard – old), §7.4 Character handling <ctype.h>, ¶1:
In all cases the argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value of the macro EOF. If the
argument has any other value, the behavior is undefined.
On your platform, char is signed, so if you have a character in the range 0x80..0xFF, it will be treated as a negative integer. The usual implementation of the isdigit() macros is to use the argument to index into an array of flag bits. Therefore, if you pass a char from the range 0x80..0xFF, you will be indexing before the start of the array, leading to undefined behaviour.
#define isdigit(x) (_CharType[(x)+1]&_Digit)
You can safely use isdigit() in either of two ways:
int c = getchar();
if (isdigit(c))
...
or:
if (isdigit((unsigned char)*args[1]))
...
In the latter case, you know that the value won't be EOF. Note that this is not OK:
int c = *args[1];
if (isdigit(c)) // Undefined behaviour if *args[1] in range 0x80..0xFF
...
The warning about 'implicit definition of function exit' means you did not include <stdlib.h> but you should have done so.
You might also notice that if the user gives you a 2 as the first character of the first argument, the exit status will be 50, not 2, because '2' is (normally, in ASCII and UTF-8 and 8859-1, etc) character code 50 ('0' is 48, etc). You'd get 2 (no quotes) by using *args[1] - '0' as the argument to exit(). You don't need a cast on that expression, though it won't do much harm.
It seems that the compiler you use has a macro for isdigit (and not a function, you wouldn't have a warning if it was the case) that uses the argument as a subscript for an array.
That's why isdigit takes an INT as argument, not a char.
One way to remove the warning it to cast your char to int :
isdigit(*args[1]) => isdigit((int)(*args[1]))
The second warning means you want to use the exit function, but it has not been defined yet. Which means you have to do the required #include.
#include <stdlib.h>
is the standard in c-library to use exit(int) function.
BTW, if this code is in your "main" function, you mustn't check "arg[1] == NULL", which could lead to a segmentation fault if the user didn't provide any argument on the command-line. You must check that the argc value (the int parameter) is greater than 1.
It is not completely clear what you want to give as exit code, but probably you want args[1][0] - '0' for the decimal value that the character represents and not the code of the character.
If you do it like that, you'd have the side effect that the type of that expression is int and you wouldn't see the warning.
For exit you probably forgot to include the header file.
Try changing
isdigit(*args[1])
to
isdigit(args[1][0])
Your other errors are because you aren't using #include <stdlib.> which defines the exit function.

Resources