Reading file with fgetc and adding sentences into linked list - c

last three days I have a problem..
I have a file containing sentences.
When I'm reading file with
int maxSize = 256;
int currSize = 0;
int i = 0;
char *sentence = (char*)malloc(maxSize);
char c;
currSize = maxSize;
while ((c = fgetc(input)) != EOF)
{
sentence[i++] = c;
while((c = fgetc(input)) != '\n')
{
sentence[i++] = c;
if((c == '.') || (c == '?') || (c == '!'))
sentence[i++] = '\n';
if(i == currSize)
{
currSize = i + maxSize;
sentence = (char*)realloc(sentence,currSize);
}
}
}
sentence[i] = '\0';
addSentence(sentence);
when function addSentence is adding sentences into linked list there is problem because it only add one sentence made from all what is in the file...
I'm beginner in C. Thank you.

Your problem is that you only call addSentence() at the EOF, so it doesn't magically get to see anything before you have read the whole file. Presumably, you need to call it when you detect the end of a sentence (with the test for '.', '?' or '!' — you'll also need to null terminate the string before calling addSentence and reset the memory with a new allocation and the correct size) as well as at EOF. It's not clear why you have two loops; you could miss some newlines as end of sentence. Rework with just one loop.
It's not entirely clear if newlines mark the ends of sentences. This revision assumes that they do:
int maxSize = 256;
int currSize = maxSize;
int i = 0;
int c;
char *sentence = (char*)malloc(maxSize);
assert(sentence != 0); // Not a production-ready error check
while ((c = fgetc(input)) != EOF)
{
sentence[i++] = c;
if ((c == '\n') || (c == '.') || (c == '?') || (c == '!'))
{
if (c != '\n')
sentence[i++] = '\n';
sentence[i] = '\0';
addSentence(sentence);
sentence = malloc(maxSize);
assert(sentence != 0); // Not a production-ready error check
currSize = maxSize;
i = 0;
}
if (i == currSize)
{
currSize = i + maxSize;
sentence = (char*)realloc(sentence, currSize);
assert(sentence != 0); // Not a production-ready error check
}
}
sentence[i] = '\0';
addSentence(sentence);
Note that the error checking for failed memory allocation is not production quality; there should be some proper, unconditional error checking. There is a small risk of buffer overflow if the end of sentence punctuation falls in exactly the wrong place. Production code should avoid that, too, but it would be fiddlier. I'd use a string data type and a function to do the adding. I'd probably also take a guess that most sentences are shorter than 256 characters (especially if newlines mark the end), and would use maxSize of 64. It would lead to less unused memory being allocated.

Related

Printf for string goes down a line unwillingly

This is my program (school exercise, should be receiving a string from the user, change it and return the original and new string in a certain format):
#include <stdio.h>
#define MAX_STRING_LENGTH 50
char switchChar(char c) {
if ((c >= 'A') && (c <= 'Z')) {
c = c + 32;
} else
if ((c >= 'a') && (c <= 'z')) {
c = c - 32;
}
if ((c > '5') && (c <= '9')) {
c = 56;
}
if ((c >= '0') && (c < '5')) {
c = 48;
}
return c;
}
int main(void) {
char temp;
int i = 0;
char stringInput[MAX_STRING_LENGTH + 1];
printf("Please enter a valid string\n");
fgets(stringInput, 50, stdin);
char newString[MAX_STRING_LENGTH + 1];
while ((i != MAX_STRING_LENGTH + 1) && (stringInput[i] != '\0')) {
temp = switchChar(stringInput[j]);
newString[i] = temp;
i++;
}
printf( "\"%s\"", stringInput);
printf("->");
printf( "\"%s\"", newString);
return 0;
}
When running, the output goes down a line after the string and before the last " character, although it should all be printed in the same line.
I would appreciate any directions.
There are several issues in your code:
fgets() reads and leaves the newline character at the end of the destination array if present and if enough space is available. For consistency with your algorithm, you should strip this newline. You can do this safely with stringInput[strcspn(stringInput, "\n")] = '\0'; or use a little more code if you cannot use <string.h>. The presence of this newline character explains the observed undesirable behavior.
You read a line with fgets(), but you pass a buffer size that might be incorrect: hard coded to 50 when the array size is MAX_STRING_LENGTH + 1. With MAX_STRING_LENGTH defined as 50, it is not a problem, but if you later change the definition of the macro, you might forget to update the size argument to fgets(). Use sizeof stringInput for consistency
you forget to set the null terminator in newString. Testing the boundary value for i is not necessary as stringInput is null terminated within the array boundaries.
in switchChar(), you should not hardcode character values from the ASCII charset: it reduces portability and most importantly, reduces readability.
Here is a corrected and simplified version:
#include <stdio.h>
#define MAX_STRING_LENGTH 50
char switchChar(char c) {
if ((c >= 'A') && (c <= 'Z')) {
c = c + ('a' - 'A');
} else
if ((c >= 'a') && (c <= 'z')) {
c = c - ('a' - 'A');
} else
if ((c > '5') && (c <= '9')) {
c = '8';
} else
if ((c >= '0') && (c < '5')) {
c = '0';
}
return c;
}
int main(void) {
char stringInput[MAX_STRING_LENGTH + 1];
char newString[MAX_STRING_LENGTH + 1];
int c;
printf("Please enter a valid string\n");
if (fgets(stringInput, sizeof stringInput, stdin) != NULL) {
// strip the newline character if present
//stringInput[strcspn(stringInput, "\n")] = '\0';
char *p;
for (p = stringInput; *p != '\0' && *p != '\n'); p++)
continue;
*p = '\0';
for (i = 0; stringInput[i] != '\0'; i++) {
newString[i] = switchChar(stringInput[i]);
}
newString[i] = '\0';
printf("\"%s\"", stringInput);
printf("->");
printf("\"%s\"", newString);
printf("\n");
}
return 0;
}
It's because fgets() reads in the newline character as well if there's room in the buffer and it's stored in your newString.
You can remove it with:
fgets(stringInput,50,stdin);
stringInput[strcspn(stringInput, "\n")] = 0; /* removes the trailing newline if any */
From fgets():
fgets() reads in at most one less than size characters from stream
and stores them into the buffer pointed to by s. Reading stops after
an EOF or a newline. If a newline is read, it is stored into the
buffer. A terminating null byte ('\0') is stored after the last character in the buffer.
You requirements contain:
get only one string
no special processing for blank characters
In that case, scanf is probably more adapted than fgets, because the former will clean the input for any initial blank(space or tab) and stop before the first trailing blank (space, tab, cr or newline). Remark: as scanf stops before the first blank, the string cannot contains spaces or tab. If it is a problem, use fgets.
Just replace the line:
fgets(stringInput, 50, stdin);
with:
i = scanf("%50s", stringInput);
if (i != 1) { /* always control input function return code */
perror("Could not get input string");
return 1;
}
If you prefere to use fgets for any reason, you should remove the (optional) trailing newline:
if (NULL == fgets(stringInput, 50, stdin)) { /* control input */
perror("Could not get input string");
return 1;
}
int l = strlen(stringInput);
if ((l > 0) && (stringInput[l - 1] == '\n')) { /* test for a trailing newline */
stringInput[l - 1] = '\0'; /* remove it if found */
}

C Program won't remove comments that take up the whole line

So I'm working through the K&R C book and there was a bug in my code that I simply cannot figure out.
The program is supposed to remove all the comments from a C program. Obviously I'm just using stdin
#include <stdio.h>
int getaline (char s[], int lim);
#define MAXLINE 1000 //maximum number of characters to put into string[]
#define OUTOFCOMMENT 0
#define INASINGLECOMMENT 1
#define INMULTICOMMENT 2
int main(void)
{
int i;
int isInComment;
char string[MAXLINE];
getaline(string, MAXLINE);
for (i = 0; string[i] != EOF; ++i) {
//finds whether loop is in a comment or not
if (string[i] == '/') {
if (string[i+1] == '/')
isInComment = INASINGLECOMMENT;
if (string[i+1] == '*')
isInComment = INMULTICOMMENT;
}
//fixes the problem of print messing up after the comment
if (isInComment == INASINGLECOMMENT && string[i] == '\0')
printf("\n");
//if the line is done, restates all the variables
if (string[i] == '\0') {
getaline(string, MAXLINE);
i = 0;
if (isInComment != INMULTICOMMENT)
isInComment = OUTOFCOMMENT;
}
//prints current character in loop
if(isInComment == OUTOFCOMMENT && string[i] != EOF)
printf("%c", string[i]);
//checks to see of multiline comment is over
if(string[i] == '*' && string[i+1] == '/' ) {
++i;
isInComment = OUTOFCOMMENT;
}
}
return 0;
}
So this works great except for one problem. Whenever a line starts with a comment, it prints that comment.
So for instance, if I had a line that was simply
//this is a comment
without anything before the comment begins, it will print that comment even though it's not supposed to.
I thought I was making good progress, but this bug has really been holding me up. I hope this isn't some super easy thing I've missed.
EDIT: Forget the getaline function
//puts line into s[], returns length of that line
int getaline(char s[], int lim)
{
int c, i;
for (i = 0; i < lim-1 && (c = getchar()) != '\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
There are many problems in your code:
isInComment is not initialized in function main.
as pointed by others, string[i] != EOF is wrong. You need to test for end of file more precisely, especially for files that do not end with a linefeed. This test only works if char type is signed and EOF is a valid signed char value. It will nonetheless mistakenly stop on a stray \377 character, which is legal in a string or in a comment.
When you detect the end of line, you read another line and reset i to 0, but i will be incremented by the for loop before you test again for single line comment... hence the bug!
You do not handle special cases such as /* // */ or // /*
You do not handle strings. This is not a comment: "/*", nor this: '//'
You do not handle \ at end of line (escaped linefeed). This can be used to extend single line comments, strings, etc. There are more subtle cases related to \ handling and if you really want completeness, you should handle trigraphs too.
Your implementation has a limit for line size, this is not needed.
The problem you are assigned is a bit tricky. Instead of reading and parsing lines, read one character at a time and implement a state machine to parse escaped linefeeds, strings, and both comment styles. The code is not too difficult if you do it right with this method.
if (string[i] == '\0') {
getaline(string, MAXLINE);
i = 0;
if (isInComment != INMULTICOMMENT)
isInComment = OUTOFCOMMENT;
}
When you start a new line, you initialize i to 0. But then in the next iteration:
for (i = 0; string[i] != EOF; ++i)
i will be incremented, so you'll begin the new line with index 1. Therefore there is a bug when the line begins with //.
You can see that it solves the problem if you write instead:
if (string[i] == '\0') {
getaline(string, MAXLINE);
i = 0;
if (isInComment != INMULTICOMMENT)
isInComment = OUTOFCOMMENT;
}
though it's usually considered as bad style to modify for loop indices inside the loop. You may redesign your implementation in a more readable way.

Can't assign value to a variable inside a for loop

Here is what I want to do:
Read all characters from a '.c' file and store that into an array.
When a character from that array is '{', it will be pushed into a stack. And count of pushed characters will be increased by 1.
When a character from that array is '}', stack will pop and the count of popped characters will be increased by 1.
Compare those two counts to check whether there is a missing '{' or '}'
Here is my code:
int getLinesSyntax(char s[], int limit, FILE *cfile)
{
int i, c, push_count = 0, pop_count = 0;
int state = CODE;
int brackets[limit];
char braces[limit];
for(i = 0; i < 100; i++)
{
braces[i] = 0;
}
for(i = 0; i < limit - 1 && (c = getc(cfile)) != EOF && c != '\n'; i++)
{
s[i] = c;
if(s[i] == '{')
{
braces[0] = s[i];
//push(s[i], braces);
++push_count;
}
else if(s[i] == '}')
{
pop(braces);
++pop_count;
}
}
//Mor shiljih uyed array -n togsgold 0-g zalgana
if(c == '\n')
{
s[i] = c;
i++;
}
s[i] = '\0';
i = i -1; //Suuld zalgasan 0 -g toonoos hasna
if(c == EOF)
{
//just checking
for(i = 0; i < 100; i++)
{
printf("%d", braces[i]);
}
if(push_count != pop_count)
{
printf("%d and %d syntax error: braces", push_count, pop_count);
}
return -1;
}
else
{
return i;
}
}
Here is the output
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
The problems is:
Assignments inside the for loop is not working. (It's working when I put that outside of the loop)
I would like to know if there's something wrong with my code :).
There are several problems.
Lets go through it step by step
1) Your array initialization loop:
int brackets[limit];
char braces[limit];
for(i = 0; i < 100; i++)
{
braces[i] = 0;
}
You declare the array having size of limit but only initialize 100 items. Change 100 to limit to fully initialize it depending on the parameter of the function.
2) The conditional statement of the main for loop:
i < limit - 1 && (c = getc(cfile)) != EOF && c != '\n'
Although the first substatement is correct I have two remarks:
Firstly (c = getc(cfile)) != EOF might be one reason why the loop is never accessed and still everything is 000000.... Check if the file exists, the pointer is not NULL or other silent errors occured.
Secondly the c != '\n'. What if one of these characters occurs? In this case you won't continue with the next iteration but break out of the entire forloop. Remove it there and put it in the first line of the body like this:
if(c == '\n')
{
i -= 1; // to really skip the character and maintain the index.
continue;
}
3) s[i] = c;
Can you be certain, that the array is indeed sizeof limit?
4) Checking for curly braces
if(s[i] == '{')
{
braces[0] = s[i];
//push(s[i], braces);
++push_count;
}
else if(s[i] == '}')
{
pop(braces);
++pop_count;
}
You assign to braces[0] always, why?
5) Uninitialized access
if(c == '\n')
{
s[i] = c;
i++;
}
s[i] = '\0';
i = i -1; //Suuld zalgasan 0 -g toonoos hasna
You're now using the function-global variable i, which is never initialized properly for this block. What you do is to use a variable that is used basically everywhere ( which is basically also no problem from the memory point of view. ), but you rely on legacy values. Is this done by purpose? If no, reinitialize i properly. I have to ask this since i can't read your comments in code.
What I'm quite unhappy about is that you entirely rely on one variable in all the loops and statements. Usually a loop-index should never be altered from inside. Maybe you can come up with a cleaner design of the function like an additional index variable you parallelly increase without altering i. The additional index will be used for array access where appropriate whereas i really remains just a counter.
I think the problem is in this condition "c != '\n'" which is breaking the for loop right after the first line, before it reaches any brackets. And hence the output.
For the task of counting whether there are balanced braces in the data, the code is excessively complex. You could simply use:
int l_brace = 0;
int r_brace = 0;
int c;
while ((c = getchar()) != EOF)
{
if (c == '{')
l_brace++;
else if (c == '}')
r_brace++;
}
if (l_brace != r_brace)
printf("Number of { = %d; number of } = %d\n", l_brace, r_brace);
Of course, this can be confused by code such as:
/* This is a comment with an { in it */
char string[] = "{{{";
char c = '{';
There are no braces that mark control-of-flow statement grouping in that fragment, for all there are 5 left braces ({) in the source code. Parsing C properly is hard work.

C: character comparison fails

After succesfully running an entabulator, my detabulator won't pick up on a character comparison that should exit a while loop. After trying "0(tab)8(enter)(ctrl+D)" as input the tab is written correctly as spaces, but after rp is incremented to point to the 8, the while loop that should read the 8 won't exit and I get a seg fault. Here's the code:
#include <string.h>
#include <stdio.h>
#define MAXLINE 100
char doc[9001];
main(int argc, char *argv[])
{
int max = 0;
char *rp = doc;
char *wp = rp;
char *tf = wp;
char *lp = doc;
while ((*(rp++) = getchar()) != EOF);
*--rp = '\0';
rp = doc;
j = 0;
while ( (*rp != '\0') && (argc == 1)) {
if (*rp == '\n') {
lp = rp + 1;
*wp++ = *rp++;
}
while( (*rp != '\t') && (*rp != '\0') && (*rp != '\n') ) { /*this loops after a tab*/
*wp++ = *rp++;
}
if (*rp == '\t') {
rp++;
tf = lp + ((((wp - lp) / 8) + 1) * 8);
while ((tf - wp) != 0)
*wp++ = 's';
}
}
if (*rp == '\0')
*wp = '\0';
printf("%s\n", doc);
}
There are some as yet unexplored problems with the initial input loop.
You should never risk overflowing a buffer, even if you allocate 9001 bytes for it. That's how viruses and things break into programs. Also, you have a problem because you are comparing a character with EOF. Unfortunately, getchar() returns an int: it has to because it returns any valid character value as a positive value, and EOF as a negative value (usually -1, but nothing guarantees that value).
So, you might write that loop more safely, and clearly, as:
char *end = doc + sizeof(doc) - 1;
int c;
while (rp < end && (c = getchar()) != EOF)
*rp++ = c;
*rp = '\0';
With your loop as written, one of two undesirable things happens:
if char is an unsigned type, then you will never detect EOF.
if char is a signed type, then you will detect EOF when you read a valid character (often ÿ, y-umlaut, LATIN SMALL LETTER Y WITH DIAERESIS, U+00FF).
Neither is good. The code above avoids both problems without needing to know whether plain char is signed or unsigned.
Conventionally, if you have an empty loop body, you emphasize this by placing the semicolon on a line on its own. Many an infinite loop has been caused by a stray semicolon after a while condition; by placing the semicolon on the next line, you emphasize that it is intentional, not accidental.
while ((*(rp++) = getchar()) != EOF);
while ((*(rp++) = getchar()) != EOF)
;
What I feel is, the below loop is going into infinite loop.
while( (*rp != '\t') && (*rp != '\0') && (*rp != '\n') ) { /*this loops after a tab*/
*wp++ = *rp++;
This is because, you are checking for rp!= '\t' and so on, but here
if (*rp == '\t')
{
rp++;
tf = lp + ((((wp - lp) / 8) + 1) * 8);
while ((tf - wp) != 0)
*wp++ = 's';
}
you are filling the doc array with char 's' and which is over writing '\t' also, so the above loop is going to infinite.

C: segmentation fault

I've created this function to read a word. I got segmentation fault and I can't find the problem. Here's what I've done.
void LeeCaracter(FILE * fp, char * s)
{
char c;
int i = 0;
c = fgetc(fp);
while(c==' ' || c=='\t' || c=='\n')
c = fgetc(fp);
while(c!=' ' && c!='\n')
{
s[i] = c;
i++;
c = fgetc(fp);
}
s[i] = '\0';
}
s is a pointer parameter, as I have to use it later. Is it correct to write it just with one *? Thanks for your help!
*And what about if I wanted to know the character that follows the word(' ' or '\n')? I added this after the while loop:
"printf("%c",c);"
but it doesn't print anything. Any ideas?
Consider:
while(c==' ' || c=='\t' || c=='\n')
c = fgetc(fp);
So, at this point, two things that c is not are ' ' and '\n'. Then:
while(c!=' ' && c!='\n')
{
s[i] = c;
i++;
}
Since the value of c does not change in the loop, the while condition is always true. Meaning that pretty quickly, s[i] will go out of bounds. You need to check against the length of s, probably by getting that passed in as a parameter (not to mention, rethink your algorithm a bit -- probably you want to fgetc more inside the loop).
You have to make sure that the 's' has enough space for containing a word with maximum characters in the input file. Then you need to make sure that you check for 'End Of File'. Here is a working version. I hope it works for you as well.
#include <stdio.h>
void LeeCaracter(FILE * fp, char * s)
{
char c;
int i = 0;
c = fgetc(fp);
if (feof(fp)) return;
while (c == ' ' || c == '\t' || c == '\n')
c = fgetc(fp);
while (!feof(fp) && (c != ' ' && c != '\n')) {
s[i++] = c;
c = fgetc(fp);
}
s[i] = '\0';
printf("%s\n", s);
}
int main(void)
{
char s[128]; /* assuming no word is larger than this size */
FILE *fp = fopen("/usr/share/dict/words", "r");
while (!feof(fp)) {
LeeCaracter(fp, s);
}
return 0;
}

Resources