Calling isalpha Causing Segmentation Fault - c

I have the following program that causes a segmentation fault.
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(int argc, char *argv[])
{
printf("TEST");
for (int k=0; k<(strlen(argv[1])); k++)
{
if (!isalpha(argv[1])) {
printf("Enter only alphabets!");
return 1;
}
}
return 0;
}
I've figured out that it is this line that is causing the problem
if (!isalpha(argv[1])) {
and replacing argv[1] with argv[1][k] solves the problem.
However, I find it rather curious that the program results in a segmentation fault without even printing TEST. I also expect the isalpha function to incorrectly check if the lower byte of the char* pointer to argv[1], but this doesn't seem to be the case. I have code to check for the number of arguments but isn't shown here for brevity.
What's happening here?

In general it is rather pointless to discuss why undefined behaviour leads to this result or the other.
But maybe it doesn't harm to try to understand why something happens even if it is outside the spec.
There are implementation of isalpha which use a simple array to lookup all possible unsigned char values. In that case the value passed as parameter is used as index into the array.
While a real character is limited to 8 bits, an integer is not.
The function takes an int as parameter. This is to allow entering EOF as well which does not fit into unsigned char.
If you pass an address like 0x7239482342 into your function this is far beyond the end of the said array and when the CPU tries to read the entry with that index it falls off the rim of the world. ;)
Calling isalpha with such an address is the place where the compiler should raise some warning about converting a pointer to an integer. Which you probably ignore...
The library might contain code that checks for valid parameters but it might also just rely on the user not passing things that shall not be passed.

printf was not flushed
the implicit conversion from pointer to integer that ought to have generated at least compile-time diagnostics for constraint violation produced a number that was out of range for isalpha. isalpha being implemented as a look-up table means that your code accessed the table out of bounds, therefore undefined behaviour.
Why you didn't get diagnostics might be in one part because of how isalpha is implemented as a macro. On my computer with Glibc 2.27-3ubuntu1, isalpha is defined as
# define isalpha(c) __isctype((c), _ISalpha)
# define __isctype(c, type) \
((*__ctype_b_loc ())[(int) (c)] & (unsigned short int) type)
the macro contains an unfortunate cast to int in it, which will silence your error!
One reason why I am posting this answer after so many others is that you didn't fix the code, it still suffers from undefined behaviour given extended characters and char being signed (which happens to be generally the case on x86-32 and x86-64).
The correct argument to give to isalpha is (unsigned char)argv[1][k]! C11 7.4:
In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

I find it rather curious that the program results in a segmentation fault without even printing TEST
printf doesn't print instantly, but it writes to a temporal buffer. End your string with \n if you want to flush it to actual output.
and replacing argv[1] with argv[1][k] solves the problem.
isalpha is intended to work with single characters.

First of all, a conforming compiler must give you a diagnostic message here. It is not allowed to implicitly convert from a pointer to the int parameter that isalpha expects. (It is a violation of the rules of simple assignment, 6.5.16.1.)
As for why "TEST" isn't printed, it could simply be because stdout isn't flushed. You could try adding fflush(stdout); after printf and see if this solves the issue. Alternatively add a line feed \n at the end of the string.
Otherwise, the compiler is free to re-order the execution of code as long as there are no side effects. That is, it is allowed to execute the whole loop before the printf("TEST");, as long as it prints TEST before it potentially prints "Enter only alphabets!". Such optimizations are probably not likely to happen here, but in other situations they can occur.

Related

Why does this code accessing the array after scanf result in a segmentation error?

For some homework I have to write a calculator in C. I wanted to input some string with scanf and then access it. But when I access the first element I get a segmentation error.
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
int main(){
char input1[30];
scanf("%s",input1);
printf("%s",input1);
char current = input1[0];
int counter = 0;
while(current != '\0'){
if(isdigit(current) || current == '+' || current == '-' || current == '*' || current == '/'){
counter++;
current = input1[counter];
}else{
printf("invalid input\n");
exit(1);
}
}
return 0;
}
The printf in line 3 returns the string, but accessing it in line 4 returns a segmentation error (tested in gdb). Why?
There are a few potential causes, some of which have been mentioned in the comments (I won't cover those). It's hard to say which one (or more) is the cause of your problem, so I guess it makes sense to iterate them. However, you may notice that I cite some resources in the process... The information is out there, yet you're not stumbling across it until it's too late. Something needs to change with how you research, because this is slowing your progress down.
On input/output dynamics, just a quick note
printf("%s",input1);
Unless we include a trailing newline, this output may be delayed (or "buffered"), which may have the effect of confusing you about the root of your issues. As an alternative to using a trailing newline (which I'd prefer, personally) you could explicitly force partial lines to be written by invoking fflush(stdout) immediately after each of the relevant output operations, or use setbuf to disable buffering entirely. I think this is unlikely to be your problem, but it may mask your problem, so it's important to realise, when using printf to debug, it might be best to include a trailing newline...
On main entry points
The first potential culprit I see is here:
int main()
I don't know why our education system is still pushing these broken lessons. My only guess is the professors learnt many years back using the nowadays irrelevant Turbo C and don't want to stay up-to-date with tech. We can further reduce this to a simple testcase to work out if this is your segfault, but like I said, it's hard to say whether this is actually your problem...
int main() {
char input1[30];
memset(input1, '\x90', sizeof input1);
return 0; // this is redundant for `main` nowadays, btw
}
To explain what's going on here, I'll cite this page, which you probably ought to go and read (in its entirety) once you're done here:
A common misconception for C programmers, is to assume that a function prototyped as follows takes no arguments:
int foo();
In fact, this function is deemed to take an unknown number of arguments. Using the keyword void within the brackets is the correct way to tell the compiler that the function takes NO arguments.
Simply put, if the linker doesn't know/can't work out how many arguments are required for the entry point, there's probably gonna be some oddness to your callstack, and that's gonna occur at the beginning or end of your program.
On input errors, return values and uninitialised access
#include <assert.h>
#include <stdio.h>
#include <string.h>
int main(void) {
char input1[30];
memset(input1, '\x90', sizeof input1);
scanf("%s",input1); // this is sus a.f.
assert(memchr(input1, '\0', sizeof input1));
}
In my testcase, I actually wrote '\x90' to each byte in the array, to show that if the scanf call fails you may end up with an array that has no null terminator. If this is your problem, this assertion is likely to throw (as you can see from the ideone demo) when you run it, which indicates that your loop is likely accessing garbage beyond the bounds of input1. On this note I intended to demonstrate that we (mostly) cannot rely upon scanf and friends unless we also check their return values! There's a good chance your compiler is warning you about this one, so another lesson is uto pay close attention to warning messages, and strive to have none.
On argument expectations for standard library functions
For many standard library functions it may be possible to give input that is outside of the acceptable domain, and so causes instability. The most common form, which I also see in your program, exists in the form of possibly passing invalid values to <ctype.h> functions. In your case, you could change the declaration of current to be an unsigned char instead, but the usual idiom is to put the cast explicitly in the call (like isdigit((unsigned char) current)) so the rest of us can see you're not stuck in this common error, at least while you're learning C.
Please note at this point I'm thinking whichever resources you're using to learn aren't working, because you're stumbling into common traps... please try to find more reputable resources to learn from so you don't fall into more common traps and waste more time later on. If you're struggling, check out the C tag wiki...

why does this lead to core dump?

#include <ctype.h>
#include <stdio.h>
int atoi(char *s);
int main()
{
printf("%d\n", atoi("123"));
}
int atoi(char *s)
{
int i;
while (isspace(*s))
s++;
int sign = (*s == '-') ? -1 : 1;
/* same mistake for passing pointer to isdigit, but will not cause CORE DUMP */
// isdigit(s), s++;// this will not lead to core dump
// return -1;
/* */
/* I know s is a pointer, but I don't quite understand why code above will not and code below will */
if (!isdigit(s))
s++;
return -1;
/* code here will cause CORE DUMP instead of an comile-time error */
for (i = 0; *s && isdigit(s); s++)
i = i * 10 + (*s - '0');
return i * sign;
}
I got "Segmentation fault (core dumped)" when I accidentally made mistake about missing * operator before 's'
then I got this confusing error.
Why "(!isdigit(s))" lead to core dump while "isdigit(s), s++;" will not.
From isdigit [emphasis added]
The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF.
From isdigit [emphasis added]
The c argument is an int, the value of which the application shall ensure is a character representable as an unsigned char or equal to the value of the macro EOF. If the argument has any other value, the behavior is undefined.
https://godbolt.org/z/PEnc8cW6T
An undefined behaviour includes it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.
All answers so far has failed to point out the actual problem, which is that implicit pointer to integer conversions are not allowed during assignment in C. Details here: "Pointer from integer/integer from pointer without a cast" issues
Specifically C17 6.5.2.2/7
If the expression that denotes the called function has a type that does include a prototype,
the arguments are implicitly converted, as if by assignment, to the types of the
corresponding parameters
Where "as if by assignment" sends us to check the rules of assignment 6.5.16.1, which are quoted in the above link. So isdigit(s) is equivalent to something like this:
char* s;
...
int param_to_isdigit = s; // constraint violation of 6.5.16.1
Here the compiler must issue a diagnostic message. If you didn't spot it or in case you are using a tool chain giving warnings instead of errors, check out What compiler options are recommended for beginners learning C? so that you prevent code like this from compiling, so that you don't have to spend time troubleshooting bugs that the compiler already spotted for you.
Furthermore, the ctype.h functions require that the passed integer must be representable as unsigned char, but that's another story. C17 7.4 Character handling <ctype.h>:
In all cases the argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value of the macro EOF
You are invoking undefined behavior. isdigit() is supposed to receive an int argument, but you pass in a pointer. This is effectively attempting to assign a pointer to an int (xref: Language / Expressions / Assignment operators / Simple assignment, ¶1).
Furthermore, there is a constraint that the argument to isdigit() be representable as an unsigned char or equal to EOF. (xref: Library / Character handling <ctype.h>, ¶1).
As a guess, the isdigit() function may be performing some kind of table lookup, and the input value may cause the function to access a pointer value beyond the table.
Why no segfault from isdigit(s), s++;?
First of all. Undefined behavior can manifest itself in a lot of ways, including the program working as intended. That's what undefined means.
But that line is not equivalent to your if statement. What this does is that it executes isdigit(s), throws away the result, increments s and also throw away the result of that operation.
However, isdigit does not have side effects, so it's quite probable that the compiler simply removes the call to that function, and replace this line with an unconditional s++. That would explain why it does not segfault. But you would have to study the generated assembly to make sure, but it's a possibility.
You can read about the comma operator here What does the comma operator , do?
I wasn't able to repeat this behaviour in MacOS/Darwin, but I was able to in Debian Linux.
To investigate a bit further, I wrote the following program:
#include <ctype.h>
#include <stdio.h>
int main()
{
printf("isalnum('a'): %d\n", isalnum('a'));
printf("isalpha('a'): %d\n", isalpha('a'));
printf("iscntrl('\n'): %d\n", iscntrl('\n'));
printf("isdigit('1'): %d\n", isdigit('1'));
printf("isgraph('a'): %d\n", isgraph('a'));
printf("islower('a'): %d\n", islower('a'));
printf("isprint('a'): %d\n", isprint('a'));
printf("ispunct('.'): %d\n", ispunct('.'));
printf("isspace(' '): %d\n", isspace(' '));
printf("isupper('A'): %d\n", isupper('A'));
printf("isxdigit('a'): %d\n", isxdigit('a'));
printf("isdigit(0x7fffffff): %d\n", isdigit(0x7fffffff));
return 0;
}
In MacOS, this just prints out 1 for every result except the last one, implying that these functions are simply returning the result of a logical comparison.
The results are a bit different in Linux:
isalnum('a'): 8
isalpha('a'): 1024
iscntrl('\n'): 2
isdigit('1'): 2048
isgraph('a'): 32768
islower('a'): 512
isprint('a'): 16384
ispunct('.'): 4
isspace(' '): 8192
isupper('A'): 256
isxdigit('a'): 4096
Segmentation fault
This suggests to me that the library used in Linux is fetching values from a lookup table and masking them with a bit pattern corresponding to the argument provided. For example, '1' (ASCII 49) is an alphanumeric character, a digit, a printable character and a hex digit, so entry 49 in this lookup table is probably equal to 8+2018+32768+16384+4096, which is 55274.
The documentation for these functions does mention that the argument must have either the value of an unsigned char (0-255) or EOF (-1), so any value outside this range is causing this table to be read out of bounds, resulting in a segmentation error.
Since I'm only calling the isdigit() function with an integer argument, this can hardly be described as undefined behaviour. I really think the library functions should be hardened against this sort of problem.

Strings behvior on C

I want to understand a number of things about the strings on C:
I could not understand why you can not change the string in a normal assignment. (But only through the functions of string.h), for example: I can't do d="aa" (d is a pointer of char or a array of char).
Can someone explain to me what's going on behind the scenes - the compiler gives to run such thing and you receive segmentation fault error.
Something else, I run a program in C that contains the following lines:
char c='a',*pc=&c;
printf("Enter a string:");
scanf("%s",pc);
printf("your first char is: %c",c);
printf("your string is: %s",pc);
If I put more than 2 letters (on scanf) I get segmentation fault error, why is this happening?
If I put two letters, the first letter printed right! And the string is printed with a lot of profits (incorrect)
If I put a letter, the letter is printed right! And the string is printed with a lot of profits and at the end something weird (a square with four numbers containing zeros and ones)
Can anyone explain what is happening behind?
Please note: I do not want the program to work, I did not ask the question to get suggestions for another program, I just want to understand what happens behind the scenes in these situations.
Strings almost do not exist in C (except as C string literals like "abc" in some C source file).
In fact, strings are mostly a convention: a C string is an array of char whose last element is the zero char '\0'.
So declaring
const char s[] = "abc";
is exactly the same as
const char s[] = {'a','b','c','\0'};
in particular, sizeof(s) is 4 (3+1) in both cases (and so is sizeof("abc")).
The standard C library contains a lot of functions (such as strlen(3) or strncpy(3)...) which obey and/or presuppose the convention that strings are zero-terminated arrays of char-s.
Better code would be:
char buf[16]="a",*pc= buf;
printf("Enter a string:"); fflush(NULL);
scanf("%15s",pc);
printf("your first char is: %c",buf[0]);
printf("your string is: %s",pc);
Some comments: be afraid of buffer overflow. When reading a string, always give a bound to the read string, or else use a function like getline(3) which dynamically allocates the string in the heap. Beware of memory leaks (use a tool like valgrind ...)
When computing a string, be also aware of the maximum size. See snprintf(3) (avoid sprintf).
Often, you adopt the convention that a string is returned and dynamically allocated in the heap. You may want to use strdup(3) or asprintf(3) if your system provides it. But you should adopt the convention that the calling function (or something else, but well defined in your head) is free(3)-ing the string.
Your program can be semantically wrong and by bad luck happening to sometimes work. Read carefully about undefined behavior. Avoid it absolutely (your points 1,2,3 are probable UB). Sadly, an UB may happen to sometimes "work".
To explain some actual undefined behavior, you have to take into account your particular implementation: the compiler, the flags -notably optimization flags- passed to the compiler, the operating system, the kernel, the processor, the phase of the moon, etc etc... Undefined behavior is often non reproducible (e.g. because of ASLR etc...), read about heisenbugs. To explain the behavior of points 1,2,3 you need to dive into implementation details; look into the assembler code (gcc -S -fverbose-asm) produced by the compiler.
I suggest you to compile your code with all warnings and debugging info (e.g. using gcc -Wall -g with GCC ...), to improve the code till you got no warning, and to learn how to use the debugger (e.g. gdb) to run your code step by step.
If I put more than 2 letters (on scanf) I get segmentation fault error, why is this happening?
Because memory is allocated for only one byte.
See char c and assigned with "a". Which is equal to 'a' and '\0' is written in one byte memory location.
If scanf() uses this memory for reading more than one byte, then this is simply undefined behavior.
char c="a"; is a wrong declaration in c language since even a single character is enclosed within a pair of double quotes("") will treated as string in C because it is treated as "a\0" since all strings ends with a '\0' null character.
char c="a"; is wrong where as char c='c'; is correct.
Also note that the memory allocated for char is only 1byte, so it can hold only one character, memory allocation details for datatypes are described bellow

Seg Fault with isdigit() in C?

I have this code:
#include <ctype.h>
char *tokenHolder[2500];
for(i = 0; tokenHolder[i] != NULL; ++i){
if(isdigit(tokenHolder[i])){ printf("worked"); }
Where tokenHolder holds the input of char tokens from user input which have been tokenized through getline and strtok. I get a seg fault when trying to use isdigit on tokenHolder — and I'm not sure why.
Since tokenHolder is an array of char *, when you index tokenHolder[i], you are passing a char * to isdigit(), and isdigit() does not accept pointers.
You are probably missing a second loop, or you need:
if (isdigit(tokenHolder[i][0]))
printf("working\n");
Don't forget the newline.
Your test in the loop is odd too; you normally spell 'null pointer' as 0 or NULL and not as '\0'; that just misleads people.
Also, you need to pay attention to the compiler warnings you are getting! Don't post code that compiles with warnings, or (at the least) specify what the warnings are so people can see what the compiler is telling you. You should be aiming for zero warnings with the compiler set to fussy.
If you are trying to test that the values in the token array are all numbers, then you need a test_integer() function that tries to convert the string to a number and lets you know if the conversion does not use all the data in the string (or you might allow leading and trailing blanks). Your problem specification isn't clear on exactly what you are trying to do with the string tokens that you've found with strtok() etc.
As to why you are getting the core dump:
The code for the isdigit() macro is often roughly
#define isdigit(x) (_Ctype[(x)+1]&_DIGIT)
When you provide a pointer, it is treated as a very large (positive or possibly negative) offset to an array of (usually) 257 values, and because you're accessing memory out of bounds, you get a segmentation fault. The +1 allows EOF to be passed to isdigit() when EOF is -1, which is the usual value but is not mandatory. The macros/functions like isdigit() take either an character as an unsigned char — usually in the range 0..255, therefore — or EOF as the valid inputs.
You're declaring an array of pointer to char, not a simple array of just char. You also need to initialise the array or assign it some value later. If you read the value of a member of the array that has not been initialised or assigned to, you are invoking undefined behaviour.
char tokenHolder[2500] = {0};
for(int i = 0; tokenHolder[i] != '\0'; ++i){
if(isdigit(tokenHolder[i])){ printf("worked"); }
On a side note, you are probably overlooking compiler warnings telling you that your code might not be correct. isdigit expects an int, and a char * is not compatible with int, so your compiler should have generated a warning for that.
You need/want to cast your input to unsigned char before passing it to isdigit.
if(isdigit((unsigned char)tokenHolder[i])){ printf("worked"); }
In most typical encoding schemes, characters outside the USASCII range (e.g., any letters with umlauts, accents, graves, etc.) will show up as negative numbers in the typical case that char is a signed.
As to how this causes a segment fault: isdigit (along with islower, isupper, etc.) is often implemented using a table of bit-fields, and when you call the function the value you pass is used as an index into the table. A negative number ends up trying to index (well) outside the table.
Though I didn't initially notice it, you also have a problem because tokenHolder (probably) isn't the type you expected/planned to use. From the looks of the rest of the code, you really want to define it as:
char tokenHolder[2500];

Please explain this ambiguity in C

When I am compiling this program I am getting some random number as output.. In Cygwin the output is 47 but in RHEL5, it is giving some negative random numbers as output.
Can anyone tell me the reason?
Code:
main()
{
printf("%d");
}
This program provokes undefined behavior since it does not follow the rules of C. You should give printf one argument per format specifier after the format string.
On common C implementations, it prints whatever happens to be on the stack after the pointer to "%d", interpreted as an integer. On others, it may send demons flying out of your nose.
It is Undefined Behaviour.
On 3 counts:
absence of prototype for a function taking a variable number of arguments
lying to printf by telling it you are sending 1 argument and sending none
absence to return a value from main (in C99 a return 0; is assumed, but your code definitely isn't C99)
Anything can happen.
printf expects a second argument, so it reads whatever happens to be on the stack at that location. Essentially it's reading random memory and printing it out.

Resources