Infinite loop when getting a line with getchar() - c

I was trying an exercise from K&R (ex 1-17), and I came up with my own solution.
The problem is that my program appears to hang, perhaps in an infinite loop. I omitted the NUL ('\0') character insertion as I find C generally automatically attaches it to the end of a string (Doesn't it?).
Can somebody please help me find out what's wrong?
I'm using the GCC compiler with Cygwin on win8(x64), if that helps..
Question - Print all input lines that are longer than 80 characters
#include<stdio.h>
#define MINLEN 80
#define MAXLEN 1000
/* getlin : inputs the string and returns its length */
int getlin(char line[])
{
int c,index;
for(index = 0 ; (c != '\n') && ((c = getchar()) != EOF) && (index < MAXLEN) ; index++)
line[index] = c;
return (index); // Returns length of the input string
}
main()
{
int len;
char chArr[MAXLEN];
while((len = getlin(chArr))>0)
{
/* A printf here,(which I had originally inserted for debugging purposes) Miraculously solves the problem!!*/
if(len>=MINLEN)
printf("\n%s",chArr);
}
return 0;
}

And I omitted the null('\0') character insertion as I find C generally automatically attaches it to the end of a string (Doesn't it?).
No, it doesn't. You're using getchar() to read input characters one at a time. If you put the chars in an array yourself, you'll have to terminate it yourself.
The C functions that return a string will generally terminate it, but that's not what you're doing here.
Your input loop is a little weird. The logical AND operator only executes the right-hand-side if the left-hand-side evaluates to false (it's called "short-circuiting"). Rearranging the order of the tests in the loop should help.
for(index = 0 ; (index < MAXLEN) && ((c = getchar()) != EOF) && (c != '\n'); index++)
line[index] = c;
This way, c receives a value from getchar() before you perform tests on its contents.

I'm not positive about what's wrong, but you don't provide the input to the program so I'm guessing.
My guess is that in getlin your variable c gets set to '\n' and at that point it never gets another character. It just keeps returning and looping.
You never SET c to anything inside your getlin function before you test it, is the problem.

C does not insert a NUL terminator at the end of strings automatically. Some functions might do so (e.g. snprintf). Consult your documentation. Additionally, take care to initialize all your variables, like c in getlin().

1) C doesn't add a final \0 to your string. You are responsible for using an array of at least 81 chars and puting the final \0 after the last character you write in it.
2) You're testing the value of c before reading it
3) Your program doesn't print anything because printf uses a buffer for I/O which is flushed when you send \n. Modify this statement to print a final \n:
printf("\n%s",chArr);
to become:
printf("%s\n",chArr);
4) To send an EOF to your program you should do a Ctrl+D under unix and I don't know if it's possible for windows. This may be the reason why the program never ends.

Related

What index the i-th element of any array refer to in c?

i've stumbled upon an assignment in a piece of code, in which we add a null character to an array line[i] = '\0' to explicitly declare it's a string, the latter rose in me the question: as the null character is exactly at the end of any string, well how do we know that adding \0 to the i-th element of line would be added to the last position in it, in my eyes i in line, could be any element with any index ,so do the i-th index of any array refer to the last position or what ?
Code like this usually appears just after code that has used the same index variable i to construct the string.
For example:
char string[10];
int i = 0;
string[i++] = 'a';
string[i++] = 'b';
string[i++] = 'c';
string[i] = '\0';
Or, more realistically:
char line[100];
int i = 0;
int c;
while((c = getchar()) != EOF && c != '\n')
line[i++] = c;
line[i] = '\0';
This second example reads one line of text from standard input and stores it in the line array as a proper, null-terminated string.
(In real code, of course, you also have to worry about the possibility of overflowing the array.)
To make things really clear, you can imagine writing code like this more explicitly, with a separate variable to hold the length of the string. For example:
i = 0;
while((c = getchar()) != EOF && c != '\n')
line[i++] = c;
int length_of_string = i;
line[length_of_string] = '\0';
When you see that line
line[length_of_string] = '\0';
it makes it more obvious that the \0 terminator is being stored at a spot in the string that someone has actually determined to be the length of the string. But as you can see, since the variable length_of_string has just been set based on the value of i after the loop, it's perfectly equivalent to just write
line[i] = '\0';
There's sort of an academic-sounding term called loop invariant, but code like this ends up being a perfect example of what it means, and it's worth thinking about for a moment. A loop invariant is something you can say about a loop that's true at all times, for every trip through the loop, at the beginning or the end or in the middle of the loop. For the read-a-line loop I've just shown, the loop invariant is:
i always contains the number of characters that have been read into the string line.
Let's look at all of the ways this "loop invariant" is true. To make things very clear, I'm going to write the loop again, with some comments to make it clear what I mean by the "top" and "bottom" of the loop:
i = 0;
while((c = getchar()) != EOF && c != '\n') {
/* top of loop */
line[i++] = c;
/* bottom of loop */
}
Before the loop runs, the string is empty, so i starts out as 0.
At the top of the loop, before the line[i++] = c step, i still has the value it did last time through the loop.
In the middle of the loop, the line line[i++] = c simultaneously stores the character c into the line array (and at the right spot!), and increments i.
At the bottom of the loop, after the line[i++] = c step, i contains the updated number of characters in the string.
After the loop (and this was your question), since i still contains the number of characters that have been read and stored into line, it's precisely the right index to use to null-terminate the string, with the line line[i] = '\0'.
The other thing that's worth paying attention to here is that the line in the middle of the loop, that simultaneously stores the next character into the line array, at the right spot, and increments i at the same time, is, once again:
line[i++] = c;
My question for you to think about is, what if I had instead written
line[++i] = c; /* WRONG */
It can be hard, at first, to really understand the difference between i++ and ++i, to understand why you would care, to understand why you might pick one over the other. This code here, I think, is an example that really makes the point.
(For extra credit, think about this: What if arrays in C were 1-based, instead of 0-based? What parts of the read-one-line loop would change, and is it still possible to maintain all facets of the loop invariant?)
If you have an already existing string and you just want it to be terminated with \0 on the last+1 index with a correct value, write a function to determine this position. E.g. check the char on the current position and check if the next position contains a legit value. You can then go trough the whole string and determine the last position, then return a pointer to the last position+1 and set your terminator. If you work with a variety of predefined strings this would be the most scalable approach for me.

Stdin + Dictionary Text Replacement Tool -- Debugging

I'm working on a project in which I have two main files. Essentially, the program reads in a text file defining a dictionary with key-value mappings. Each key has a unique value and the file is formatted like this where each key-value pair is on its own line:
ipsum i%##!
fubar fubar
IpSum XXXXX24
Ipsum YYYYY211
Then the program reads in input from stdin, and if any of the "words" match the keys in the dictionary file, they get replaced with the value. There is a slight thing about upper and lower cases -- this is the order of "match priority"
The exact word is in the replacement set
The word with all but the first character converted to lower case is in the replacement set
The word converted completely to lower case is in the replacement set
Meaning if the exact word is in the dictionary, it gets replaced, but if not the next possibility (2) is checked and so on...
My program passes the basic cases we were provided but then the terminal shows
that the output vs reference binary files differ.
I went into both files (not c files, but binary files), and one was super long with tons of numbers and the other just had a line of random characters. So that didn't really help. I also reviewed my code and made some small tests but it seems okay? A friend recommended I make sure I'm accounting for the null operator in processInput() and I already was (or at least I think so, correct me if I'm wrong). I also converted getchar() to an int to properly check for EOF, and allocated extra space for the char array. I also tried vimdiff and got more confused. I would love some help debugging this, please! I've been at it all day and I'm very confused.
There are multiple issues in the processInput() function:
the loop should not stop when the byte read is 0, you should process the full input with:
while ((ch = getchar()) != EOF)
the test for EOF should actually be done differently so the last word of the file gets a chance to be handled if it occurs exactly at the end of the file.
the cast in isalnum((char)ch) is incorrect: you should pass ch directly to isalnum. Casting as char is actually counterproductive because it will turn byte values beyond CHAR_MAX to negative values for which isalnum() has undefined behavior.
the test if(ind >= cap) is too loose: if word contains cap characters, setting the null terminator at word[ind] will write beyond the end of the array. Change the test to if (cap - ind < 2) to allow for a byte and a null terminator at all times.
you should check that there is at least one character in the word to avoid calling checkData() with an empty string.
char key[ind + 1]; is useless: you can just pass word to checkData().
checkData(key, ind) is incorrect: you should pass the size of the buffer for the case conversions, which is at least ind + 1 to allow for the null terminator.
the cast in putchar((char)ch); is useless and confusing.
There are some small issues in the rest of the code, but none that should cause a problem.
Start by testing your tokeniser with:
$ ./a.out <badhash2.c >zooi
$ diff badhash2.c zooi
$
Does it work for binary files, too?:
$ ./a.out <./a.out > zooibin
$ diff ./a.out zooibin
$
Yes, it does!
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
void processInput(void);
int main(int argc, char **argv) {
processInput();
return 0;
}
void processInput() {
int ch;
char *word;
int len = 0;
int cap = 60;
word = malloc(cap);
while(1) {
ch = getchar(); // (1)
if( ch != EOF && isalnum(ch)) { // (2)
if(len+1 >= cap) { // (3)
cap += cap/2;
word = realloc(word, cap);
}
word[len++] = ch;
} else {
if (len) { // (4)
#if 0
char key[len + 1];
memcpy(key, word, len); key[len] = 0;
checkData(key, len);
#else
word[len] = 0;
fputs(word, stdout);
#endif
len = 0;
}
if (ch == EOF) break; // (5)
putchar(ch);
}
}
free(word);
}
I only repaired your tokeniser, leaving out the hash table and the search & replace stuff. It is now supposed to generate a verbatim copy of the input. (which is silly, but great for testing)
If you want to allow binary input, you cannot use while((ch = getchar()) ...) : a NUL in the input would cause the loop to end. You must pospone testing for EOF, because ther could still be a final word in your buffer ...&& ch != EOF)
treat EOF just like a space here: it could be the end of a word
you must reserve space for the NUL ('\0') , too.
if (len==0) there would be no word, so no need to look it up.
we treated EOF just like a space, but we don't want to write it to the output. Time to break out of the loop.

C dealing with variable length string

I'm new to C, taking a university course.
In one of the tasks I'm given, I deal with strings. I take strings either entered by user or parsed from a file and then use a function on them to produce an answer (if a specific quality exists).
The string can be of variable length but it is acceptable to assume that their maximum length is 80 characters.
I created the program using a
char s[81];
and then filling up the same array with the different strings each time.
Since the string has to be null-terminated I just added a '\0' at index 80;
s[80] = '\0';
But then I got all kind of weird behaviors - Unrelated characters at the end of the string I entered. I assumed this is because there was space between the end of the 'real' characters and the '\0' character filled with garbage(?).
So what I did is I created a function:
void clean_string(char s[], int string_size) {
int index = 0;
while(index < string_size) {
s[index++] = '\0';
}
}
What I call clean, is just filling a string up with zero characters. I do this every time I am done dealing with a string and ready to accept a new one. Then I fill up the string again character by character and when ever I'll stop, the following character will be a '\0' for sure.
To not include any magic numbers in code (81 each time I call clean_string) I used the following:
#define STRING_LENGTH 81
That works for me. The strings show no strange behavior. But I wondered if this is considered bad practice. Are there problems with this approach?
Just emphasizing, I'm not asking for help in the assignment itself, but tips on how to approach these kind of situations better.
Rather than prefilling the entire array with zeros, it should be simple to just add a single zero after you've read all relevant characters.
For example:
char s[STRING_LENGTH];
int c;
int idx = 0;
while (((c = getchar()) != '\n') && (idx < STRING_LENGTH - 1) && (c != EOF)) {
s[idx++] = c;
}
s[idx] = 0;

Taking a string as input and storing them in a character array in C [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions must demonstrate a minimal understanding of the problem being solved. Tell us what you've tried to do, why it didn't work, and how it should work. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am stumped on how to store strings in an array in C, with each character kept separately. As an example, if the user inputs hellop, I want to store it in a given array, say userText, with userText[0] = h, userText[1] = e, userText[2] = l, and so on. I know this is easy stuff, but I'm still new. So if anyone could help, it would be great. Please explain how to do this using pointers.
#include<stdio.h>
void main()
{
char a[10],c;
int i=0;
while((c=getchar())!='\n')
{
scanf("%c",&a[i++]);
c=getchar();
}
for(i=0;i<11;i++)
printf("%c",a[i]);
}
The program outputs some garbage value (eoeoeoeo\363) when I type in hellop.
To read input I recommend using the fgets function. It's a nice, safe alternative to scanf.
First let's declare a buffer like so:
char user_input[20];
Then we can get user input from the command line in the following manner:
fgets(user_input, 20, stdin);
This will store a maximum of 20 characters into the string from the standard input and it will ensure it is null-terminated. The fact that we've limited the input to the size of the array declared earlier ensures that there's no possibility of buffer overruns.
Then let's clear the pesky newline that's been entered into the string using strlen:
user_input[strlen(user_input) -1] = '\0';
As strlen returns the size of the string up to the null terminator but without it, we can be sure at that position lies the newline character (\n). We replace it with a null-terminator(\0) so that the string ends there.
Finally, let's print it using printf:
printf("The user has entered '%s'\n", user_input);
To use fgets and printf you will need to declare the following header:
#include <stdio.h>
For strlen we need another header, namely:
#include <string.h>
Job done.
P.S. If I may address the code you've added to your question.
main is normally declared as int main rather than void main which also requires that main returns a value of some sort. For small apps normally return 0; is put just before the closing brace. This return is used to indicate to the OS if the program executed successfully (0 means everything was OK, non-zero means there was a problem).
You are not null-terminating your string which means that if you were to read in any other way other than with a careful loop, you will have problems.
You take input from the user twice - once with getchar and then with scanf.
If you insist on using your code I've modified it a bit:
#include<stdio.h>
int main()
{
char a[10];
int i=0;
while( (a[i++]=getchar()) != '\n' && i < 10) /* take input from user until it's a newline or equal to 10 */
;
a[i] = '\0'; /* null-terminate the string */
i = 0;
while(a[i] != '\0') /* print until we've hit \0 */
printf("%c",a[i++]);
return 0;
}
It should now work.
To read a string into char array:
char *a = NULL;
int read;
size_t len;
read = getline(&a, &len, stdin);
//free memory
free(a);
Your code is this (except I've added a bunch of spaces to improve its readability):
1 #include <stdio.h>
2 void main()
3 {
4 char a[10], c;
5 int i = 0;
6 while ((c = getchar()) != '\n')
7 {
8 scanf("%c", &a[i++]);
9 c = getchar();
10 }
11 for (i = 0; i < 11; i++)
12 printf("%c", a[i]);
13 }
Line-by-line analysis:
OK (now I've added the space between #include and <stdio.h>).
The main() function returns an int.
OK (it is hard to get an open brace wrong).
Since the return value of getchar() is an int, you need to declare c separately as an int.
OK.
Needs to account for EOF; should be while ((c = getchar()) != EOF && c != '\n'). You're still very open to buffer overflow, though.
OK.
Not OK. This reads another character from standard input, and doesn't check for EOF.
Not OK. This too reads another character from standard input. But when you go back to the top of the loop, you read another character. So, as things stand, if you type abcdefg at the program, c is assigned 'a' in the loop control, then a[0] is assigned 'b', then c is assigned 'c', then the loop repeats with a[1] getting 'e'. If I'd typed 6 characters plus newline, the loop would terminate cleanly. Because I claimed I typed 7 characters, the third iteration assigns 'g' to c, which is not newline, so a[2] gets the newline, and the program waits for more input with the c = getchar(); statement at the end of the loop.
OK (ditto close braces).
Not OK. You don't take into account early termination of the loop, and you unconditionally access a non-existent element a[10] of the array a (which only has elements 0..9 — C is not BASIC!).
OK.
You probably need to output a newline after the for loop. You should return 0; at the end of main().
Because your input buffer is so short, it will be best to code a length check. If you'd used char a[4096];, I'd probably not have bothered you about it (though even then, there is a small risk of buffer overflow with potentially undesirable consequences). All of this leads to:
#include <stdio.h>
int main(void)
{
char a[10];
int c;
int i;
int n;
for (i = 0; i < sizeof(a) && ((c=getchar()) != EOF && c != '\n')
a[i++] = c;
n = i;
for (i = 0; i < n; i++)
printf("%c", a[i]);
putchar('\n');
return 0;
}
Note that neither the original nor the revised code null terminates the string. For the given usage, that is OK. For general use, it is not.
The final for loop in the revised code and the following putchar() could be replaced (safely) by:
printf("%.*s\n", n, a);
This is safe because the length is specified so printf() won't go beyond the initialized data. To create a null terminated string, the input code needs to leave enough space for it:
for (i = 0; i < sizeof(a)-1 && ((c=getchar()) != EOF && c != '\n')
a[i++] = c;
a[i] = '\0';
(Note the sizeof(a)-1!)

K&R Chapter 1 - Exercise 22 solution, what do you think?

I'm learning C from the k&r as a first language, and I just wanted to ask, if you thought this exercise was being solved the right way, I'm aware that it's probably not as complete as you'd like, but I wanted views, so I'd know I'm learning C right.
Thanks
/* Exercise 1-22. Write a program to "fold" long input lines into two or
* more shorter lines, after the last non-blank character that occurs
* before then n-th column of input. Make sure your program does something
* intelligent with very long lines, and if there are no blanks or tabs
* before the specified column.
*
* ~svr
*
* [NOTE: Unfinished, but functional in a generic capacity]
* Todo:
* Handling of spaceless lines
* Handling of lines consisting entirely of whitespace
*/
#include <stdio.h>
#define FOLD 25
#define MAX 200
#define NEWLINE '\n'
#define BLANK ' '
#define DELIM 5
#define TAB '\t'
int
main(void)
{
int line = 0,
space = 0,
newls = 0,
i = 0,
c = 0,
j = 0;
char array[MAX] = {0};
while((c = getchar()) != EOF) {
++line;
if(c == NEWLINE)
++newls;
if((FOLD - line) < DELIM) {
if(c == BLANK) {
if(newls > 0) {
c = BLANK;
newls = 0;
}
else
c = NEWLINE;
line = 0;
}
}
array[i++] = c;
}
for(line = 0; line < i; line++) {
if(array[0] == NEWLINE)
;
else
printf("%c", array[line]);
}
return 0;
}
I'm sure you on the rigth track, but some pointers for readability:
comment your stuff
name the variables properly and at least give a description if you refuse
be consequent, some single-line if's you use and some you don't. (imho, always use {} so it's more readable)
the if statement in the last for-loop can be better, like
if(array[0] != NEWLINE)
{
printf("%c", array[line]);
}
That's no good IMHO.
First, it doesn't do what you were asked for. You were supposed to find the last blank after a nonblank before the output line boundary. Your program doesn't even remotely try to do it, it seems to strive for finding the first blank after (margin - 5) characters (where did the 5 came from? what if all the words had 9 letters?). However it doesn't do that either, because of your manipulation with the newls variable. Also, this:
for(line = 0; line < i; line++) {
if(array[0] == NEWLINE)
;
else
printf("%c", array[line]);
}
is probably wrong, because you check for a condition that never changes throughout the loop.
And, last but not least, storing the whole file in a fixed-size buffer is not good, because of two reasons:
the buffer is bound to overflow on large files
even if it would never overflow, people still wouldn't like you for storing eg. a gigabyte file in memory just to cut it into 25-character chunks
I think you should start again, rethink your algorithm (incl. corner cases), and only after that, start coding. I suggest you:
process the file line-by-line (meaning output lines)
store the line in a buffer big enough to hold the largest output line
search for the character you'll break at in the buffer
then print it (hint: you can terminate the string with '\0' and print with printf("%s", ...)), copy what you didn't print to the start of the buffer, proceed from that
An obvious problem is that you statically allocate 'array' and never check the index limits while accessing it. Buffer overflow waiting to happen. In fact, you never reset the i variable within the first loop, so I'm kinda confused about how the program is supposed to work. It seems that you're storing the complete input in memory before printing it word-wrapped?
So, suggestions: merge the two loops together and print the output for each line that you have completed. Then you can re-use the array for the next line.
Oh, and better variable names and some comments. I have no idea what 'DELIM' is supposed to do.
It looks (without testing) like it could work, but it seems kind of complicated.
Here's some pseudocode for my first thought
const int MAXLINE = ?? — maximum line length parameter
int chrIdx = 0 — index of the current character being considered
int cand = -1 — "candidate index", Set to a potential break character
char linebuf[bufsiz]
int lineIdx = 0 — index into the output line
char buffer[bufsiz] — a character buffer
read input into buffer
for ix = 0 to bufsiz -1
do
if buffer[ix] == ' ' then
cand = ix
fi
linebuf[lineIdx] = buffer[ix]
lineIdx += 1
if lineIdx >= MAXLINE then
linebuf[cand] = NULL — end the string
print linebuf
do something to move remnants to front of line (memmove?)
fi
od
It's late and I just had a belt, so there may be flaws, but it shows the general idea — load a buffer, and copy the contents of the buffer to a line buffer, keeping track of the possible break points. When you get close to the end, use the breakpoint.

Resources