Seg Fault with isdigit() in C? - c

I have this code:
#include <ctype.h>
char *tokenHolder[2500];
for(i = 0; tokenHolder[i] != NULL; ++i){
if(isdigit(tokenHolder[i])){ printf("worked"); }
Where tokenHolder holds the input of char tokens from user input which have been tokenized through getline and strtok. I get a seg fault when trying to use isdigit on tokenHolder — and I'm not sure why.

Since tokenHolder is an array of char *, when you index tokenHolder[i], you are passing a char * to isdigit(), and isdigit() does not accept pointers.
You are probably missing a second loop, or you need:
if (isdigit(tokenHolder[i][0]))
printf("working\n");
Don't forget the newline.
Your test in the loop is odd too; you normally spell 'null pointer' as 0 or NULL and not as '\0'; that just misleads people.
Also, you need to pay attention to the compiler warnings you are getting! Don't post code that compiles with warnings, or (at the least) specify what the warnings are so people can see what the compiler is telling you. You should be aiming for zero warnings with the compiler set to fussy.
If you are trying to test that the values in the token array are all numbers, then you need a test_integer() function that tries to convert the string to a number and lets you know if the conversion does not use all the data in the string (or you might allow leading and trailing blanks). Your problem specification isn't clear on exactly what you are trying to do with the string tokens that you've found with strtok() etc.
As to why you are getting the core dump:
The code for the isdigit() macro is often roughly
#define isdigit(x) (_Ctype[(x)+1]&_DIGIT)
When you provide a pointer, it is treated as a very large (positive or possibly negative) offset to an array of (usually) 257 values, and because you're accessing memory out of bounds, you get a segmentation fault. The +1 allows EOF to be passed to isdigit() when EOF is -1, which is the usual value but is not mandatory. The macros/functions like isdigit() take either an character as an unsigned char — usually in the range 0..255, therefore — or EOF as the valid inputs.

You're declaring an array of pointer to char, not a simple array of just char. You also need to initialise the array or assign it some value later. If you read the value of a member of the array that has not been initialised or assigned to, you are invoking undefined behaviour.
char tokenHolder[2500] = {0};
for(int i = 0; tokenHolder[i] != '\0'; ++i){
if(isdigit(tokenHolder[i])){ printf("worked"); }
On a side note, you are probably overlooking compiler warnings telling you that your code might not be correct. isdigit expects an int, and a char * is not compatible with int, so your compiler should have generated a warning for that.

You need/want to cast your input to unsigned char before passing it to isdigit.
if(isdigit((unsigned char)tokenHolder[i])){ printf("worked"); }
In most typical encoding schemes, characters outside the USASCII range (e.g., any letters with umlauts, accents, graves, etc.) will show up as negative numbers in the typical case that char is a signed.
As to how this causes a segment fault: isdigit (along with islower, isupper, etc.) is often implemented using a table of bit-fields, and when you call the function the value you pass is used as an index into the table. A negative number ends up trying to index (well) outside the table.
Though I didn't initially notice it, you also have a problem because tokenHolder (probably) isn't the type you expected/planned to use. From the looks of the rest of the code, you really want to define it as:
char tokenHolder[2500];

Related

why does this lead to core dump?

#include <ctype.h>
#include <stdio.h>
int atoi(char *s);
int main()
{
printf("%d\n", atoi("123"));
}
int atoi(char *s)
{
int i;
while (isspace(*s))
s++;
int sign = (*s == '-') ? -1 : 1;
/* same mistake for passing pointer to isdigit, but will not cause CORE DUMP */
// isdigit(s), s++;// this will not lead to core dump
// return -1;
/* */
/* I know s is a pointer, but I don't quite understand why code above will not and code below will */
if (!isdigit(s))
s++;
return -1;
/* code here will cause CORE DUMP instead of an comile-time error */
for (i = 0; *s && isdigit(s); s++)
i = i * 10 + (*s - '0');
return i * sign;
}
I got "Segmentation fault (core dumped)" when I accidentally made mistake about missing * operator before 's'
then I got this confusing error.
Why "(!isdigit(s))" lead to core dump while "isdigit(s), s++;" will not.
From isdigit [emphasis added]
The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF.
From isdigit [emphasis added]
The c argument is an int, the value of which the application shall ensure is a character representable as an unsigned char or equal to the value of the macro EOF. If the argument has any other value, the behavior is undefined.
https://godbolt.org/z/PEnc8cW6T
An undefined behaviour includes it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.
All answers so far has failed to point out the actual problem, which is that implicit pointer to integer conversions are not allowed during assignment in C. Details here: "Pointer from integer/integer from pointer without a cast" issues
Specifically C17 6.5.2.2/7
If the expression that denotes the called function has a type that does include a prototype,
the arguments are implicitly converted, as if by assignment, to the types of the
corresponding parameters
Where "as if by assignment" sends us to check the rules of assignment 6.5.16.1, which are quoted in the above link. So isdigit(s) is equivalent to something like this:
char* s;
...
int param_to_isdigit = s; // constraint violation of 6.5.16.1
Here the compiler must issue a diagnostic message. If you didn't spot it or in case you are using a tool chain giving warnings instead of errors, check out What compiler options are recommended for beginners learning C? so that you prevent code like this from compiling, so that you don't have to spend time troubleshooting bugs that the compiler already spotted for you.
Furthermore, the ctype.h functions require that the passed integer must be representable as unsigned char, but that's another story. C17 7.4 Character handling <ctype.h>:
In all cases the argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value of the macro EOF
You are invoking undefined behavior. isdigit() is supposed to receive an int argument, but you pass in a pointer. This is effectively attempting to assign a pointer to an int (xref: Language / Expressions / Assignment operators / Simple assignment, ¶1).
Furthermore, there is a constraint that the argument to isdigit() be representable as an unsigned char or equal to EOF. (xref: Library / Character handling <ctype.h>, ¶1).
As a guess, the isdigit() function may be performing some kind of table lookup, and the input value may cause the function to access a pointer value beyond the table.
Why no segfault from isdigit(s), s++;?
First of all. Undefined behavior can manifest itself in a lot of ways, including the program working as intended. That's what undefined means.
But that line is not equivalent to your if statement. What this does is that it executes isdigit(s), throws away the result, increments s and also throw away the result of that operation.
However, isdigit does not have side effects, so it's quite probable that the compiler simply removes the call to that function, and replace this line with an unconditional s++. That would explain why it does not segfault. But you would have to study the generated assembly to make sure, but it's a possibility.
You can read about the comma operator here What does the comma operator , do?
I wasn't able to repeat this behaviour in MacOS/Darwin, but I was able to in Debian Linux.
To investigate a bit further, I wrote the following program:
#include <ctype.h>
#include <stdio.h>
int main()
{
printf("isalnum('a'): %d\n", isalnum('a'));
printf("isalpha('a'): %d\n", isalpha('a'));
printf("iscntrl('\n'): %d\n", iscntrl('\n'));
printf("isdigit('1'): %d\n", isdigit('1'));
printf("isgraph('a'): %d\n", isgraph('a'));
printf("islower('a'): %d\n", islower('a'));
printf("isprint('a'): %d\n", isprint('a'));
printf("ispunct('.'): %d\n", ispunct('.'));
printf("isspace(' '): %d\n", isspace(' '));
printf("isupper('A'): %d\n", isupper('A'));
printf("isxdigit('a'): %d\n", isxdigit('a'));
printf("isdigit(0x7fffffff): %d\n", isdigit(0x7fffffff));
return 0;
}
In MacOS, this just prints out 1 for every result except the last one, implying that these functions are simply returning the result of a logical comparison.
The results are a bit different in Linux:
isalnum('a'): 8
isalpha('a'): 1024
iscntrl('\n'): 2
isdigit('1'): 2048
isgraph('a'): 32768
islower('a'): 512
isprint('a'): 16384
ispunct('.'): 4
isspace(' '): 8192
isupper('A'): 256
isxdigit('a'): 4096
Segmentation fault
This suggests to me that the library used in Linux is fetching values from a lookup table and masking them with a bit pattern corresponding to the argument provided. For example, '1' (ASCII 49) is an alphanumeric character, a digit, a printable character and a hex digit, so entry 49 in this lookup table is probably equal to 8+2018+32768+16384+4096, which is 55274.
The documentation for these functions does mention that the argument must have either the value of an unsigned char (0-255) or EOF (-1), so any value outside this range is causing this table to be read out of bounds, resulting in a segmentation error.
Since I'm only calling the isdigit() function with an integer argument, this can hardly be described as undefined behaviour. I really think the library functions should be hardened against this sort of problem.

Why is my output wrong? C newbie

#include <stdio.h>
int main(void)
{
char username;
username = '10A';
printf("%c\n", username);
return 0;
}
I just started learning C, and here is my first problem. Why is this program giving me 2 warnings (multi-character constant, overflow in implicit constant conversion)?
And instead of giving 10A as output, it is giving just A.
You are trying to stuff multiple characters into a single set of '', and into a single char variable. You need "" for string literals, and you'll need an array of characters to hold a string. And to print a string, use %s.
Putting all of this together, you get:
#include <stdio.h>
int main(void)
{
char username[] = "10A";
printf("%s\n", username);
return 0;
}
Footnote
From Jonathan Leffler in the comments below regarding multi-character constants:
Note that multi-character constants are a part of C (hence the warning, not an error), but the value of a multi-character constant is implementation defined and hence not portable. It is an integer value; it is larger than fits in a char, so you get that warning. You could have gotten almost anything as the output — 1, A and a null byte could all be plausible.
'10A' is an allowed but obscure way to define a value.
In the case of an int variable,
int username = '10A';
printf("%x\n", username);
will output
313041
These are pairs of hexadecimal values - each pair is
0x31 is the '1' of your input.
0x30 is the '0' of your input.
0x41 is the 'A' of your input.
But a char type can't hold this.
In C there are no String objects. Instead Strings are arrays of characters (followed by a null character). Other answers have pointed out statically allocating this memory. However I recommend dynamically allocating Strings. Just remember C lacks a garbage memory collector (like there is in java). So remember to free your pointers. Have fun!!
You could use char *username to point to the beginning of the address and loop through the memory after. For instance use sizeof(username) to get the size and then loop printf until you have printed the amount of characters in username. However you may end up with major problems if you aren't careful...

Calling isalpha Causing Segmentation Fault

I have the following program that causes a segmentation fault.
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(int argc, char *argv[])
{
printf("TEST");
for (int k=0; k<(strlen(argv[1])); k++)
{
if (!isalpha(argv[1])) {
printf("Enter only alphabets!");
return 1;
}
}
return 0;
}
I've figured out that it is this line that is causing the problem
if (!isalpha(argv[1])) {
and replacing argv[1] with argv[1][k] solves the problem.
However, I find it rather curious that the program results in a segmentation fault without even printing TEST. I also expect the isalpha function to incorrectly check if the lower byte of the char* pointer to argv[1], but this doesn't seem to be the case. I have code to check for the number of arguments but isn't shown here for brevity.
What's happening here?
In general it is rather pointless to discuss why undefined behaviour leads to this result or the other.
But maybe it doesn't harm to try to understand why something happens even if it is outside the spec.
There are implementation of isalpha which use a simple array to lookup all possible unsigned char values. In that case the value passed as parameter is used as index into the array.
While a real character is limited to 8 bits, an integer is not.
The function takes an int as parameter. This is to allow entering EOF as well which does not fit into unsigned char.
If you pass an address like 0x7239482342 into your function this is far beyond the end of the said array and when the CPU tries to read the entry with that index it falls off the rim of the world. ;)
Calling isalpha with such an address is the place where the compiler should raise some warning about converting a pointer to an integer. Which you probably ignore...
The library might contain code that checks for valid parameters but it might also just rely on the user not passing things that shall not be passed.
printf was not flushed
the implicit conversion from pointer to integer that ought to have generated at least compile-time diagnostics for constraint violation produced a number that was out of range for isalpha. isalpha being implemented as a look-up table means that your code accessed the table out of bounds, therefore undefined behaviour.
Why you didn't get diagnostics might be in one part because of how isalpha is implemented as a macro. On my computer with Glibc 2.27-3ubuntu1, isalpha is defined as
# define isalpha(c) __isctype((c), _ISalpha)
# define __isctype(c, type) \
((*__ctype_b_loc ())[(int) (c)] & (unsigned short int) type)
the macro contains an unfortunate cast to int in it, which will silence your error!
One reason why I am posting this answer after so many others is that you didn't fix the code, it still suffers from undefined behaviour given extended characters and char being signed (which happens to be generally the case on x86-32 and x86-64).
The correct argument to give to isalpha is (unsigned char)argv[1][k]! C11 7.4:
In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.
I find it rather curious that the program results in a segmentation fault without even printing TEST
printf doesn't print instantly, but it writes to a temporal buffer. End your string with \n if you want to flush it to actual output.
and replacing argv[1] with argv[1][k] solves the problem.
isalpha is intended to work with single characters.
First of all, a conforming compiler must give you a diagnostic message here. It is not allowed to implicitly convert from a pointer to the int parameter that isalpha expects. (It is a violation of the rules of simple assignment, 6.5.16.1.)
As for why "TEST" isn't printed, it could simply be because stdout isn't flushed. You could try adding fflush(stdout); after printf and see if this solves the issue. Alternatively add a line feed \n at the end of the string.
Otherwise, the compiler is free to re-order the execution of code as long as there are no side effects. That is, it is allowed to execute the whole loop before the printf("TEST");, as long as it prints TEST before it potentially prints "Enter only alphabets!". Such optimizations are probably not likely to happen here, but in other situations they can occur.

Is this (char*)&x cast's behaviour well-defined?

While writing some C code, I came across a little problem where I had to convert a character into a "string" (some memory chunk the beginning of which is given by a char* pointer).
The idea is that if some sourcestr pointer is set (not NULL), then I should use it as my "final string", otherwise I should convert a given charcode into the first character of another array, and use it instead.
For the purposes of this question, we'll assume that the types of the variables cannot be changed beforehand. In other words, I can't just store my charcode as a const char* instead of an int.
Because I tend to be lazy, I thought to myself : "hey, couldn't I just use the character's address and treat that pointer as a string?". Here's a little snippet of what I wrote (don't smash my head against the wall just yet!) :
int charcode = FOO; /* Assume this is always valid ASCII. */
char* sourcestr = "BAR"; /* Case #1 */
char* sourcestr = NULL; /* Case #2 */
char* finalstr = sourcestr ? sourcestr : (char*)&charcode;
Now of course I tried it, and as I expected, it does work. Even with a few warning flags, the compiler is still happy. However, I have this weird feeling that this is actually undefined behaviour, and that I just shouldn't be doing it.
The reason why I think this way is because char* arrays need to be null-terminated in order to be printed properly as strings (and I want mine to be!). Yet, I have no certainty that the value at &charcode + 1 will be zero, hence I might end up with some buffer overflow madness.
Is there an actual reason why it does work properly, or have I just been lucky to get zeroes in the right places when I tried?
(Note that I'm not looking for other ways to achieve the conversion. I could simply use a char tmp[2] = {0} variable, and put my character at index 0. I could also use something like sprintf or snprintf, provided I'm careful enough with buffer overflows. There's a myriad of ways to do this, I'm just interested in the behaviour of this particular cast operation.)
Edit: I've seen a few people call this hackery, and let's be clear: I completely agree with you. I'm not enough of a masochist to actual do this in released code. This is just me getting curious ;)
Your code is well-defined as you can always cast to char*. But some issues:
Note that "BAR" is a const char* literal - so don't attempt to modify the contents. That would be undefined.
Don't attempt to use (char*)&charcode as a parameter to any of the string functions in the C standard library. It will not be null-terminated. So in that sense, you cannot treat it as a string.
Pointer arithmetic on (char*)&charcode will be valid up to and including one past the scalar charcode. But don't attempt to dereference any pointer beyond charcode itself. The range of n for which the expression (char*)&charcode + n is valid depends on sizeof(int).
The cast and assignment, char* finalstr = (char*)&charcode; is defined.
Printing finalstr with printf as a string, %s, if it points to charcode is undefined behavior.
Rather than resorting to hackery and hiding string in a type int, convert the values stored in the integer to a string using a chosen conversion function. One possible example is:
char str[32] = { 0 };
snprintf( str , 32 , "%d" , charcode );
char* finalstr = sourcestr ? sourcestr : str;
or use whatever other (defined!) conversion you like.
Like other said it happens to work because the internal representation of an int on your machine is little endian and your char is smaller than an int. Also the ascii value of your character is either below 128 or you have unsigned chars (otherwise there would be sign extension). This means that the value of the character is in the lower byte(s) of the representation of the int and the rest of the int will be all zeroes (assuming any normal representation of an int). You're not "lucky", you have a pretty normal machine.
It is also completely undefined behavior to give that char pointer to any function that expects a string. You might get away with it now but the compiler is free to optimize that to something completely different.
For example if you do a printf just after that assignment, the compiler is free to assume that you'll always pass a valid string to printf which means that the check for sourcestr being NULL is unnecessary because if sourcestr was NULL printf would be called with something that isn't a string and the compiler is free to assume that undefined behavior never happens. Which means that any check of sourcestr being NULL before or after that assignment are unnecessary because the compiler already knows it isn't NULL. This assumption is allowed to spread to everywhere in your code.
This was rarely a thing to worry about and you could get away with tricks uglier than this until a decade ago or so when compiler writers started an arms race about how much they can follow the C standard to the letter to get away with more and more brutal optimizations. Today compilers are getting more and more aggressive and while the optimization I speculated about probably doesn't exist yet, if a compiler person sees this, they'll probably implement it just because they can.
This is absolutely undefined behavior for the following reasons:
Less probable, but to consider when strictly referencing to the standards: you can't assume the sizeof int on the machine/system where code will be compiled
As above you can't assume the codeset. E.g. what happen on an EBCDIC machine/system?
Easy to say that your machine has a little endian processor. On big endian machines the code fails due to big-endian memory layout.
Because on many systems char is a signed integer, as is int, when your char is a negative value (i.e. char>127 on machines having 8bits char), it could fail due to sign extension if you assign the value as in the code below
code:
char ch = FOO;
int charcode = ch;
P.S. About the point 3: your string will be indeed NULL terminated in a little endian machine having sizeof(int)>sizeof(char) and char having a positive value, because the MSB of int will be 0 and the memory layout for such endianess is LSB-MSB (LSB first).

New to C: whats wrong with my program?

I know my way around ruby pretty well and am teaching myself C starting with a few toy programs. This one is just to calculate the average of a string of numbers I enter as an argument.
#include <stdio.h>
#include <string.h>
main(int argc, char *argv[])
{
char *token;
int sum = 0;
int count = 0;
token = strtok(argv[1],",");
while (token != NULL)
{
count++;
sum += (int)*token;
token = strtok(NULL, ",");
}
printf("Avg: %d", sum/count);
printf("\n");
return 0;
}
The output is:
mike#sleepycat:~/projects/cee$ ./avg 1,1
Avg: 49
Which clearly needs some adjustment.
Any improvements and an explanation would be appreciated.
Look for sscanf or atoi as functions to convert from a string (array of characters) to an integer.
Unlike higher-level languages, C doesn't automatically convert between string and integral/real data types.
49 is the ASCII value of '1' char.
It should be helpful to you....:D
The problem is the character "1" is 49. You have to convert the character value to an integer and then average.
In C if you cast a char to an int you just get the ASCII value of it. So, you're averaging the ascii value of the character 1 twice, and getting what you'd expect.
You probably want to use atoi().
EDIT: Note that this is generally true of all typecasts in C. C doesn't reinterpret values for you, it trusts you to know what exists at a given location.
strtok(
Please, please do not use this. Even its own documentation says never to use it. I don't know how you, as a Ruby programmer, found out about its existence, but please forget about it.
(int)*token
This is not even close to doing what you want. There are two fundamental problems:
1) A char* does not "contain" text. It points at text. token is of type char*; therefore *token is of type char. That is, a single byte, not a string. Note that I said "byte", not "character", because the name char is actually wrong - an understandable oversight on the part of the language designers, because Unicode did not exist back then. Please understand that char is fundamentally a numeric type. There is no real text type in C! Interpreting a sequence of char values as text is just a convention.
2) Casting in C does not perform any kind of magical conversions.
What your code does is to grab the byte that token points at (after the strtok() call), and cast that numeric value to int. The byte that is rendered with the symbol 1 actually has a value of 49. Again, interpreting a sequence of bytes as text is just a convention, and thus interpreting a byte as a character is just a convention - specifically, here we are using the convention known as ASCII. When you hit the 1 key on your keyboard, and later hit enter to run the program, the chain of events set in motion by the command window actually passed a byte with the value 49 to your program. (In the same way, the comma has a value of 44.)
Both of the above problems are solved by using the proper tools to parse the input. Look up sscanf(). However, you don't even want to pass the input to your program this way, because you can't put any spaces in the input - each "word" on the command line will be passed as a separate entry in the argv[] array.
What you should do, in fact, is take advantage of that, by just expecting each entry in argv[] to represent one number. You can again use sscanf() to parse each entry, and it will be much easier.
Finally:
printf("Avg: %d", sum/count)
The quotient sum/count will not give you a decimal result. Dividing an integer by another integer yields an integer in C, discarding the remainder.
In this line: sum += (int)*token;
Casting a char to an int takes the ASCII value of the char. for 1, this value is 49.
Use the atoi function instead:
sum += atoi(token);
Note atoi is found in the stdlib.h file, so you'll need to #include it as well.
You can't convert a string to an integer via
sum += (int)*token;
Instead you have to call a function like atoi():
sum += atoi (token);
when you cast a char (which is what *token is) to int you get its ascii value in C - which is 49... so the average of the chars ascii values is in fact 49. you need to use atoi to get the value of the number represented

Resources