C segmentation fault errors with feof() and fgetc() - c

Can anyone help me solve my dilemma? When I compile my program I get no errors or warnings. When I go to actually run the executable, though, I get a segmentation error. If I'm to understand correctly, this happens because a pointer is in short being used incorrectly. I get a specific error on the feof(srcIn) line and I'm not sure why. The FILE* srcIn is never assigned a new value aside from the srcIn = fopen(argv[0], "r") value at the beginning of the program. I had originally had this solution implemented in C++ and needed to changed it to C for reasons. Anyways, in the C++ one I did essentially the same exact thing except using srcIn.eof() as the the condition and srcIn.get(something) as the reading method. and it compiled and ran without any problems.
int chara;
int line[maxLineLength+1];
void nextch(void){
const int charPerTab = 8;
if(charCounter == charLineCounter){
if(feof(srcIn)){
printf("\n");
isEOF = TRUE;
return;
}
printf("\n"); lineCounter++;
if(chara != '\0'){ printf("%c", line[charLineCounter-1]); } // first character each line after the first line will be skipped otherwise
charLineCounter = 0; charCounter = 0;
while(chara != '\n'){
chara = fgetc(srcIn);
if(chara >= ' '){
printf("%c", chara);
line[charLineCounter] = chara; charLineCounter++;
}
else if(chara == '\t'){ // add blanks to next tab
do{ printf(" "); line[charLineCounter] = ' '; charLineCounter++; }
while(charLineCounter % charPerTab != 1);
}
}
printf("\n"); line[charLineCounter] = chara; charLineCounter++; line[charLineCounter] = fgetc(srcIn); charLineCounter++;
// have to get the next character otherwise it will be skipped
}
chara = line[charCounter]; charCounter++;
}
EDIT:
I forgot to mention that I'm not even actually going into main when I get the seg fault. This leads me to believe that the executable itself has some sort of problem. gdb tells me the seg fault is happening at line:
if(feof(srcIn))
Any ideas?

I've got a haunting suspicion that your two-or-four-character indents aren't sufficient to let you see the real scope of the program; it could be as easy as #mu is too short and #Null Set point out, that you've got an argv[0] when you meant argv[1], and it could be as #Lou Franco points out and you're writing past the end of your array, but this code sure smells funny. Here's your code, run through Lindent to get larger tabs and one-statement-per-line:
int chara;
int line[maxLineLength + 1];
void nextch(void)
{
const int charPerTab = 8;
if (charCounter == charLineCounter) {
if (feof(srcIn)) {
printf("\n");
isEOF = TRUE;
return;
}
printf("\n");
lineCounter++;
if (chara != '\0') {
printf("%c", line[charLineCounter - 1]);
} // first character each line after the first line will be skipped otherwise
charLineCounter = 0;
charCounter = 0;
while (chara != '\n') {
chara = fgetc(srcIn);
if (chara >= ' ') {
printf("%c", chara);
line[charLineCounter] = chara;
charLineCounter++;
} else if (chara == '\t') { // add blanks to next tab
do {
printf(" ");
line[charLineCounter] = ' ';
charLineCounter++;
}
while (charLineCounter % charPerTab != 1);
}
}
printf("\n");
line[charLineCounter] = chara;
charLineCounter++;
line[charLineCounter] = fgetc(srcIn);
charLineCounter++;
// have to get the next character otherwise it will be skipped
}
chara = line[charCounter];
charCounter++;
}
You're checking whether or not you've read to the end of the file at the top, in an if statement, but you never check for eof again. Never. When you read from input in your while() loop, you use '\n' as your exit condition, print the output if the character is above ' ', do some tab expansion if you read a '\t', and you forgot to handle the EOF return from fgetc(3). If your input file doesn't have an '\n', then this program will probably write -1 into your line array until you segfault. If your input file does not end directly on a '\n', this program will probably write -1 into your line array until you segfault.
Most loops that read one character from an input stream and operate on it are written like this:
int c;
FILE *f = fopen("foo", "r");
if (!f) {
/* error message if appropriate */
return;
}
while ((c=fgetc(f)) != EOF) {
if (' ' < c) {
putchar(c);
line[counter++] = c;
} else if ('\t' == c) {
/* complex tab code */
} else if ('\n' == c) {
putchar('\n');
line[counter++] = c;
}
}
Check the input for EOF. Read input from only one spot, if you can. Use one table or if/else if/else if/else tree to decide what to do with your input character. It might not come natural to use the array[index++] = value; idiom at first, but it is common in C.
Feel free to steal my suggested loop format for your own code, and pop in the complex tab expansion code. It looked like you got that right, but I'm not positive on that, and I didn't want it to distract from the overall style of the loop. I think you'll find extending my code to solve your problem is easier than making yours work. (I fully expect you can, but I don't think it'd be fun to maintain.)

argv[0] is the name of your program so your fopen(argv[0], 'r') is probably failing. I'd guess that you want to open argv[1] instead. And, of course, check that the fopen succeeds before trying to use its return value.

It should probably be srcIn = fopen(argv[1], "r") instead. The 0th string parameter your main gets is normally the name of the program, and the 1st parameter is the first command line paramter you passed to the program.

It might not be in this function, but if the problem is here, I'd be most suspect of going out of bounds on line. Are you ever writing more than maxLineLength characters? You should put a check before you ever index into line.
Edit: You seem to be confused about what this error even means -- I will try to clear it up.
When you get a segmentation fault, the line that it happens on is just the line of code where it was finally detected that you have corrupted memory. It doesn't necessarily have anything to do with the real problem. What you need to do is to figure out where the corruption happened in the first place.
Very common causes:
calling free or delete on a pointer more than once
calling the wrong delete on a pointer (delete or delete[])
using an uninitialized pointer
using a pointer after free or delete was called on it
going out of bounds of an array (this is what I think you did)
casting a pointer to a wrong type
doing a reinterpret_cast where the target type cannot be reinterpreted correctly
calling functions with improper calling conventions
keeping a pointer to a temporary object
And there are many other ways.
The key to figuring this out is to
assume that your code is wrong
look for these kinds of problems by inspection in the code path (if short)
use tools that can tell you that you have these problems at the line of code where you did it
realizing that the line of code where the segmentation fault happens is not necessarily the bug.

Related

I am trying to create a code polisher program in C

I am trying to create the function delete_comments(). The read_file() and main functions are given.
Implement function char *delete_comments(char *input) that removes C comments from program stored at input. input variable points to dynamically allocated memory. The function returns pointer to the polished program. You may allocate a new memory block for the output, or modify the content directly in the input buffer.
You’ll need to process two types of comments:
Traditional block comments delimited by /* and */. These comments may span multiple lines. You should remove only characters starting from /* and ending to */ and for example leave any following newlines untouched.
Line comments starting with // until the newline character. In this case, newline character must also be removed.
The function calling delete_comments() only handles return pointer from delete_comments(). It does not allocate memory for any pointers. One way to implement delete_comments() function is to allocate memory for destination string. However, if new memory is allocated then the original memory in input must be released after use.
I'm having trouble understanding why my current approach is wrong or what is the specific problem that I'm getting weird output. I'm approaching the problem by trying to create a new array where to copy the input string with the new rules.
#include "source.h"
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/* Remove C comments from the program stored in memory block <input>.
* Returns pointer to code after removal of comments.
* Calling code is responsible of freeing only the memory block returned by
* the function.
*/
char *delete_comments(char *input)
{
input = malloc(strlen(input) * sizeof (char));
char *secondarray = malloc(strlen(input) * sizeof (char));
int x, y = 0;
for (x = 0, y = 0; input[x] != '\0'; x++) {
if ((input[x] == '/') && (input[x + 1] == '*')) {
int i = 0;
while ((input[x + i] != '*') && (input[x + i + 1] != '/')) {
y++;
i++;
}
}
else if ((input[x] == '/') && (input[x + 1] == '/')) {
int j = 0;
while (input[x + j] != '\n') {
y++;
j++;
}
}
else {
secondarray[x] = input[y];
y++;
}
}
return secondarray;
}
/* Read given file <filename> to dynamically allocated memory.
* Return pointer to the allocated memory with file content, or
* NULL on errors.
*/
char *read_file(const char *filename)
{
FILE *f = fopen(filename, "r");
if (!f)
return NULL;
char *buf = NULL;
unsigned int count = 0;
const unsigned int ReadBlock = 100;
unsigned int n;
do {
buf = realloc(buf, count + ReadBlock + 1);
n = fread(buf + count, 1, ReadBlock, f);
count += n;
} while (n == ReadBlock);
buf[count] = 0;
return buf;
}
int main(void)
{
char *code = read_file("testfile.c");
if (!code) {
printf("No code read");
return -1;
}
printf("-- Original:\n");
fputs(code, stdout);
code = delete_comments(code);
printf("-- Comments removed:\n");
fputs(code, stdout);
free(code);
}
Your program has fundamental issues.
It fails to tokenize the input. Comment start sequences can occur inside string literals, in which case they do not denote comments: "/* not a comment".
You have some basic bugs:
if ((input[x] == '/') && (input[x + 1] == '*')) {
int i = 0;
while ((input[x + i] != '*') && (input[x + i + 1] != '/')) {
y++;
i++;
}
}
Here, when we enter the loop, with i = 0, input + x is still pointing to the opening /. We did not skip over the opening * and are already looking for a closing *. This means that the sequence /*/ will be recognized as a complete comment, which it isn't.
This loop's also assumes that every /* comment is properly closed. It's not checking for the null character which can terminate the input, so if the comment is not closed, it will march beyond the end of the buffer.
C has line continuations. In ISO C translation stage 2, all backlash-newline sequences are deleted, converting one or more physical lines into logical lines. What that means is that a // comment can span multiple physical lines:
// this is an \
extended comment
You can see, by the way, that StackOverflow's automatic language detector for syntax highlighting is getting this right!
Line continuations are independent of tokenization, which doesn't happen until translation stage 3. Which means:
/\
/\
this is an extended \
comment
That one has defeated StackOverflow's syntax highlighting.
Furthermore, a line continuation can happen in any token, possibly multiple times:
"\
this is a string literal\
"
If you really want to make this work 100% correctly, you need to parse the input. By "parse" I mean a more formal, rigorous detection routine that understands what it is reading, in the context it is reading it.
For example, there are many times where this code could be defeated.
printf("the answer is %d // %d\n", a, b);
would likely trip your // detection and strip the end of the printf.
There are two general approaches to the problem above:
Find every corner case where comment-like characters could be used, and write conditional statements to avoid them before stripping.
Fully parse the language, so you will know if you are within a string or some other context that's wrapping comment like characters, or if you are in the top level context where the characters really mean "this is a comment"
To learn about parsing, I generally recommend "The Dragon Book" but it is a hard read, unless you have studied a bit of Discrete Mathematics. It covers a lot of different parsing techniques, and in doing so it doesn't have many pages left for examples. This means that it's the kind of book where you have to read, think, and then program a mini-example. If you follow that path, there is no input you can't tackle.
If you are pragmatic in your solution, and it is not about learning parsing, but about stripping comments, I recommend that you find a well constructed parser for C, and then learn how to walk the Abstract Syntax Tree in an Emitter, which fails to emit the comments.
There are some projects that do this already; but, I don't know if they have the right structure for easy modification. lint comes to mind, as well as other "pretty-printers" GCC certainly has the parsing code in there, but I've heard that GCC's Abstract Syntax Tree isn't easy to learn.
Your solution has several problems:
The worst issue
As the first instruction in delete_comments() you overwrite input with a new pointer returned by malloc(), which points to memory of random contents.
In consequence the address to the real input is lost.
Oh, and please check the returned value, if you call malloc().
Failing to increment the scanned position in comments correctly
You are scanning the input by the index x, but if you detect a comment, you don't change it.
You are actually advancing y but this is only used for the copying.
Think about lines like these:
int x; /* some /* weird /* comment */
///////////////////////////////
for (;;) { }
Ignoring character and string literals
Your solution should take character and string literals into account.
For example:
int c_plus_plus_comment_start = '//'; /* multi character constant */
const char* c_comment_start = "/*";
Note: There are more. Learn to use a debugger, or at least insert lots of printf()s in "interesting" places.

Segmentation fault when opening file in C?

I'm very new to programming and I'm trying to write code that reads "numbers.tsv4" (.tsv4 means tab separated values, 4 to a line) and puts the numbers into an array. Right now I'm just focusing on counting the amount of numbers in a file, so I can initialize the array size.
int main(void)
{
int cur;
FILE* spData;
int size=1;
spData = fopen("numbers.tsv4", "r");
while ((cur = fgetc(spData)) != EOF) {
if ((cur = fgetc(spData)) == '\t') {
size++;}
}
fclose(spData);
printf("%d", size);
return;
}
I keep getting a segmentation fault and I've changed so many things to try to figure it out. Could someone give me a hand? Thanks!
The structure with your while statement is the problem. At the beginning of each iteration, you are already getting the next character with fgetc() and assigning it to cur. Then inside the loop, in the if(...) statement, you discard the cur by calling a new fgetc() and assigning the result to cur. So, change it in the following way:
while ((cur = fgetc(spData)) != EOF) {
if (cur == '\t') {
size++;}
}
You see, when you try to invoke the fgetc() twice (both in while(...) and if(...)), you probably get the EOF in the if(...) statement. Then in the next iteration with the while(...) statement, you try to access somewhere out of file, which gives you the segmentation fault.

Trouble \0 null terminating a string (C)

I seem to have some trouble getting my string to terminate with a \0. I'm not sure if this the problem, so I decided to make a post.
First of all, I declared my strings as:
char *input2[5];
Later in the program, I added this line of code to convert all remaining unused slots to become \0, changing them all to become null terminators. Could've done with a for loop, but yea.
while (c != 4) {
input2[c] = '\0';
c++;
}
In Eclipse when in debug mode, I see that the empty slots now contain 0x0, not \0. Are these the same things? The other string where I declared it as
char input[15] = "";
shows \000 when in debug mode though.
My problem is that I am getting segmentation faults (on Debian VM. Works on my Linux 12.04 though). My GUESS is that because the string hasn't really been terminated, the compiler doesn't know when it stops and thus continues to try to access memory in the array when it is clearly already out of bound.
Edit: I will try to answer all other questions soon, but when I change my string declaration to the other suggested one, my program crashes. There is a strtok() function, used to chop my fgets input into strings and then putting them into my input2 array.
So,
input1[0] = 'l'
input1[1] = 's'
input1[2] = '\n'
input2[0] = "ls".
This is a shell simulating program with fork and execvp. I will post more code soon.
Regarding the suggestion:
char *input2[5]; This is a perfectly legal declaration, but it
defined input2 as an array of pointers. To contain a string, it needs
to be an array of char.
I will try that change again. I did try that earlier, but I remember it giving me another run-time error (seg fault?). I think it is because of the way I implemented my strtok() function though. I will check it out again. Thanks!
EDIT 2: I added a response below to update my progress so far. Thanks for all the help!
It is here.
.
You code should rather look like this:
char input2[5];
for (int c=0; c < 4; c++) {
input2[c] = '\0';
}
0x0 and \0 are different representation of the same value 0;
Response 1:
Thanks for all the answers!
I made some changes from the responses, but I reverted the char suggestion (or correct string declaration) because like someone pointed out, I have a strtok function. Strtok requires me to send in a char *, so I reverted back to what I originally had (char * input[5]). I posted my code up to strtok below. My problem is that the program works fine in my Ubuntu 12.04, but gives me a segfault error when I try to run it on the Debian VM.
I am pretty confused as I originally thought the error was because the compiler was trying to access an array index that is already out of bound. That doesn't seem like the problem because a lot of people mentioned that 0x0 is just another way of writing \000. I have posted my debug window's variable section below. Everything seems right though as far as I can see.. hmm..
Input2[0] and input[0], input[1 ] are the focus points.
Here is my code up to the strtok function. The rest is just fork and then execvp call:
int flag = 0;
int i = 0;
int status;
char *s; //for strchr, strtok
char input[15] = "";
char *input2[5];
//char input2[5];
//Prompt
printf("Please enter prompt:\n");
//Reads in input
fgets(input, 100, stdin);
//Remove \n
int len = strlen(input);
if (len > 0 && input[len-1] == '\n')
input[len-1] = ' ';
//At end of string (numb of args), add \0
//Check for & via strchr
s = strchr (input, '&');
if (s != NULL) { //If there is a &
printf("'&' detected. Program not waiting.\n");
//printf ("'&' Found at %s\n", s);
flag = 1;
}
//Now for strtok
input2[i] = strtok(input, " "); //strtok: returns a pointer to the last token found in string, so must declare
//input2 as char * I believe
while(input2[i] != NULL)
{
input2[++i] = strtok( NULL, " ");
}
if (flag == 1) {
i = i - 1; //Removes & from total number of arguments
}
//Sets null terminator for unused slots. (Is this step necessary? Does the C compiler know when to stop?)
int c = i;
while (c < 5) {
input2[c] = '\0';
c++;
}
Q: Why didn't you declare your string char input[5];? Do you really need the extra level of indirection?
Q: while (c < 4) is safer. And be sure to initialize "c"!
And yes, "0x0" in the debugger and '\0' in your source code are "the same thing".
SUGGESTED CHANGE:
char input2[5];
...
c = 0;
while (c < 4) {
input2[c] = '\0';
c++;
}
This will almost certainly fix your segmentation violation.
char *input2[5];
This is a perfectly legal declaration, but it defined input2 as an array of pointers. To contain a string, it needs to be an array of char.
while (c != 4) {
input2[c] = '\0';
c++;
}
Again, this is legal, but since input2 is an array of pointers, input2[c] is a pointer (of type char*). The rules for null pointer constants are such that '\0' is a valid null pointer constant. The assignment is equivalent to:
input2[c] = NULL;
I don't know what you're trying to do with input2. If you pass it to a function expecting a char* that points to a string, your code won't compile -- or at least you'll get a warning.
But if you want input2 to hold a string, it needs to be defined as:
char input2[5];
It's just unfortunate that the error you made happens to be one that a C compiler doesn't necessarily diagnose. (There are too many different flavors of "zero" in C, and they're often quietly interchangeable.)

C - reading past end of file with fgetc

I have the weirdest thing happening, and I'm not quite sure why it's happening. Basically what I need to do is use fgetc to get the contents of a simple ASCII file byte by byte. The weird part is it worked, but then I added a few more characters and all of a sudden it added a newline that wasn't there and read past the end of the file or something. Literally all I did was
do {
temp = (char*) checked_realloc (temp, n+1);
e = fgetc(get_next_byte_argument);
temp[n] = e;
if (e != EOF)
n++;
}
while (e != EOF);
And then to check I just printed each character out
temp_size = strlen(temp)-1;
for(debug_k = 0; debug_k < temp_size; debug_k++){
printf("%c", temp[debug_k]);
}
And it outputs everything correctly except it added an extra newline that wasn't in the file. Before that, I had
temp_size = strlen(temp);
But then it ended on some unknown byte (that printed gibberish). I tried strlen(temp)-2 just in case and it worked for that particular file, but then I added an extra "a" to the end and it broke again.
I'm honestly stumped. I have no idea why it's doing this.
EDIT: checked_realloc is just realloc but with a quick check to make sure I'm not out of memory. I realize this is not the most efficient way to do this, but I'm more worried about why I seem to be magically reading in extra bytes.
A safer way to write such operation is:
memset the memory bulk before use with zeros, if you are allocating memory prior to realloc.And every time you realloc, initialize it to zero.
If you are using a memory to access strings or use string functions on that memory always ensure you are terminating that memory with a NULL byte.
do{
temp = (char*) checked_realloc (temp, n+1);//I guess you are starting n with 0?
temp[n]=0;
e = fgetc(get_next_byte_argument);
temp[n] = e;
if (e != EOF)
n++;
} while (e != EOF);
temp[n]=0;
n=0;
I guess the above code change should fix your issue. You don't need strlen -1 anymore. :)
Cheers.
It sounds like you forgot to null terminate your string. Add temp[n] = 0; just after the while.

Skip white space and return one word at a time in C

This code is supposed to skip white space and return one word at a time. A couple of questions on this code: When the code gets to the *word++=c; line I get a core dump. Have I written this line correctly? and is return correct. And Do I need to somehow allocate memory to store the word?
//get_word
int get_word(char *word,int lim){
int i=0;
int c;
int quotes=0;
int inword = 1;
while(
inword &&
(i < (lim-1)) &&
((c=getchar()) != EOF)
){
if(c==('\"')){//this is so i can get a "string"
if (quotes) {
inword = 0;
}
quotes = ! quotes;
}
else if(quotes){ //if in a string keep storing til the end of the string
*word++=c;//pointer word gets c and increments the pointer
i++;
}
else if(!isspace(c)) {//if not in string store
*word++=c;
i++;
}
else {
// Only end if we have read some character ...
if (i)
inword = 0;
}
}
*word='\0'; //null at the end to signify
return i; //value
}
It's impossible to tell why this core dumps without seeing the code that calls get_word. The failure at the line you named implies that you are passing it something invalid in the first parameter. There's nothing wrong with that line in and of itself, but if word does not point to writable memory large enough to hold your output characters, you are in trouble.
The answer to your question about allocating memory to hold it is yes - however this could be local (e.g. a char array in the caller's local variables, global, or heap-based (e.g. from char * wordHolder = malloc(wordLimit);). The fact you are asking this supports the guess that your parameter 1 value is the problem.

Categories

Resources