C filling an array from a text file - c

I'm trying to fill an array from a text file. I'm using fgetc and my problem is dealing with the newline characters that are in the text file. I've currently got,
for(i = 0; i < rows; i++){
for(j = 0; j < columns; j++){
if((fgetc(fp) == '\n') || (fgetc(fp) == '\r')){
fgetc(fp);
array[i][j] = fgetc(fp);
else{
array[i][j] = fgetc(fp);
}
printf("i %d j %d char %c code %d\n", i, j, array[i][j], array[i][j]);
}
}
The idea is that if there's a newline character I want to advance the file pointer while in the same i,j position of the loop so I can get the next character. The output for this is jumbled for the first two rows and then it starts reading characters with character code -1. Am I doing something terribly wrong?

Each call to fgetc will advance the file pointer. Try calling it once:
int c = fgetc(fp);
then test the value of c. Store it if you want or go through the loop again.

In your first if() statement, there's a bit of an issue. When you do this:
if((fgetc(fp) == '\n') || (fgetc(fp) == '\r')){
fgetc(fp);
array[i][j] = fgetc(fp);
You actually are calling fgetc(fp) four times. Twice in the if() statement, twice later on.
Perhaps you are looking more for something like this:
for(i = 0; i < rows; i++){
for(j = 0; j < columns; j++){
int test = fgetc(fp);
if(test != '\n' && test != '\r')
array[i][j] = test;
//We want to "undo" the last j++ if we got a whitespace
else
j--;
printf("i %d j %d char %c code %d\n", i, j, array[i][j], array[i][j]);
}
}
In this example, you call fgetc(fp) exactly once per iteration, and if it's not a \n or \r, you put it in your array.
I'm sorry, I have little experience with fgetc(). If you notice something incredibly awful with what I've done, please notify me!

I can immediately see one source of error. In the following line:
if((fgetc(fp) == '\n') || (fgetc(fp) == '\r'))
There are 2 calls to fgetc(). This means that if the first call does not return '\n', another call will be made whose return value is then compared to '\r'. This has the effect of advancing the file pointer twice, as the pointer is advanced each time you call fgetc . A better way to do this would be to fetch one character and then test whether it is '\n' or '\r', and only then incrementing the file pointer with another call to fgetc if this is true. For example:
char letter = fgetc(fp);
if((letter == '\n') || (letter == '\r')
...
...
Try this and see if you still get the same error.

I believe you are getting the character in your evaluation statement twice. Also, typically the CRLF (carriage return and line feed) end of line characters can be two characters. Read http://en.wikipedia.org/wiki/Newline for details on this.
#include <stdio.h>
int main ()
{
FILE *fp;
int c;
fp = fopen("file.txt","r");
if(fp == NULL)
{
fprintf(stderr,"Error opening file");
return(-1);
}
do
{
c = fgetc(fp);
if ((c == '\n') || (c == '\r')) {
fgetc(fp); // skip CR or LF and advance a character
} else {
printf("%c", c); // print all other characters
}
}while(c != EOF);
fclose(fp);
return(0);
}
This was a quick hit at the code from memory. I don't have a compiler readily available but I think it is correct.

Related

How do I ignore spaces at the end of the inputfile in C?

I am dealing with a weird problem right now. Basically, my c-program receives a txt as input and I already know how long it should be. Therefore after reading it, I check for a EOF and exit successfully.
But if there are whitespaces after the text I won't get that EOF. In the textfile, the user won't notice the whitespaces obviously. How do I ignore the whitespaces? I guess I can't iterate over then because that could go on forever? The input is a LEN(gth) x WID(th) grid of the letter a and b(That's why I check for the \n).
for(i = 0; i < LEN; i++)
{
for(j = 0; j < WID; j++)
{
c = fgetc(infile);
if((c != 'a') && (c != 'b'))
{
return INVALID_INPUT;
}
board[i][j] = c;
}
c = fgetc(infile);
if (!(c == '\n' || c == EOF))
{
return INVALID_INPUT;
}
}
if(fgetc(infile) != EOF)
{
return INVALID_INPUT;
}
fclose(infile);
return VALID_INPUT;
If you truly need to check there is only whitespace left at the end, you have no other option then to iterate over it.
while (isspace(fgetc(infile))){}
if(fgetc(infile) != EOF) {
fclose(infile)
return INVALID_INPUT;
}
You will need to #include <ctype.h> to be able to use isspace. This loop will skip over all whitespace (like spaces, tabs and newlines) and stop at EOF or any non whitespace character.
Note that you also need to close the file in case of invalid input.
But the big question here is, do you absolutely need to check for other stuff following the regular input or not.

is there any way to stop a (c = getchar()) != EOF) if my work inside the while loop is done?

I am reading the C programming language book Dennis M. Ritchie and
trying to solve this question:
Write a program to print a histogram of
the lengths of words in
its input. It is easy to draw the histogram with the bars horizontal; a vertical
orientation is more challenging.
I think my solution works, but the problem is that if I don't press EOF, the terminal won't show the
result. I know that the condition specifies that exactly, but I am
wondering whether there is any way to make the program terminate after
reading a single line? (Sorry if my explanation of the problem is a bit shallow. Feel free to ask more.)
#include <stdio.h>
int main ()
{
int digits[10];
int nc=0;
int c, i, j;
for (i = 0; i <= 10; i++)
digits[i] = 0;
//take input;
while ((c = getchar ()) != EOF) {
++nc;
if (c == ' ' || c=='\n') {
++digits[nc-1];
//is it also counting the space in nc? i think it is,so we should do nc-1
nc = 0;
}
}
for (i = 1; i <= 5; i++) {
printf("%d :", i);
for (j = 1; j <= digits[i]; j++) {
printf ("*");
}
printf ("\n");
}
// I think this is a problem with getchar()
//the program doesn't exit automatically
//need to find a way to do it
}
You could try to make something like
while ((c = getchar ()) != EOF && c != '\n') {
and then adding a line after the while loop to account for the last word:
if (c == '\n') {
++digits[nc-1];
nc = 0;
There is also another problem inside your program. ++digits[nc-1]; is correct, however, for the wrong reason. You should make it because an array starts at zero, i.e. if you have an array of length 10, it will go from 0 to 9, so you should count the length of the words and then add one to the position of the array length - 1 (as there are no words of length zero). The problem is that you are still counting the blank spaces or the newline characters inside the length of a word, so if you have two blank spaces after a word of length 4, the program will add to the array a word of length 5 + a word of length 1. To avoid this, you should do something like this:
while ((c = getchar ()) != EOF) {
if ((c == ' ' || c == '\n' || c == '\t') && nc > 0) {
++digits[nc-1]; // arrays start at zero
nc = 0;
}
else {
++nc;
}
}

C Program won't remove comments that take up the whole line

So I'm working through the K&R C book and there was a bug in my code that I simply cannot figure out.
The program is supposed to remove all the comments from a C program. Obviously I'm just using stdin
#include <stdio.h>
int getaline (char s[], int lim);
#define MAXLINE 1000 //maximum number of characters to put into string[]
#define OUTOFCOMMENT 0
#define INASINGLECOMMENT 1
#define INMULTICOMMENT 2
int main(void)
{
int i;
int isInComment;
char string[MAXLINE];
getaline(string, MAXLINE);
for (i = 0; string[i] != EOF; ++i) {
//finds whether loop is in a comment or not
if (string[i] == '/') {
if (string[i+1] == '/')
isInComment = INASINGLECOMMENT;
if (string[i+1] == '*')
isInComment = INMULTICOMMENT;
}
//fixes the problem of print messing up after the comment
if (isInComment == INASINGLECOMMENT && string[i] == '\0')
printf("\n");
//if the line is done, restates all the variables
if (string[i] == '\0') {
getaline(string, MAXLINE);
i = 0;
if (isInComment != INMULTICOMMENT)
isInComment = OUTOFCOMMENT;
}
//prints current character in loop
if(isInComment == OUTOFCOMMENT && string[i] != EOF)
printf("%c", string[i]);
//checks to see of multiline comment is over
if(string[i] == '*' && string[i+1] == '/' ) {
++i;
isInComment = OUTOFCOMMENT;
}
}
return 0;
}
So this works great except for one problem. Whenever a line starts with a comment, it prints that comment.
So for instance, if I had a line that was simply
//this is a comment
without anything before the comment begins, it will print that comment even though it's not supposed to.
I thought I was making good progress, but this bug has really been holding me up. I hope this isn't some super easy thing I've missed.
EDIT: Forget the getaline function
//puts line into s[], returns length of that line
int getaline(char s[], int lim)
{
int c, i;
for (i = 0; i < lim-1 && (c = getchar()) != '\n'; ++i)
s[i] = c;
if (c == '\n') {
s[i] = c;
++i;
}
s[i] = '\0';
return i;
}
There are many problems in your code:
isInComment is not initialized in function main.
as pointed by others, string[i] != EOF is wrong. You need to test for end of file more precisely, especially for files that do not end with a linefeed. This test only works if char type is signed and EOF is a valid signed char value. It will nonetheless mistakenly stop on a stray \377 character, which is legal in a string or in a comment.
When you detect the end of line, you read another line and reset i to 0, but i will be incremented by the for loop before you test again for single line comment... hence the bug!
You do not handle special cases such as /* // */ or // /*
You do not handle strings. This is not a comment: "/*", nor this: '//'
You do not handle \ at end of line (escaped linefeed). This can be used to extend single line comments, strings, etc. There are more subtle cases related to \ handling and if you really want completeness, you should handle trigraphs too.
Your implementation has a limit for line size, this is not needed.
The problem you are assigned is a bit tricky. Instead of reading and parsing lines, read one character at a time and implement a state machine to parse escaped linefeeds, strings, and both comment styles. The code is not too difficult if you do it right with this method.
if (string[i] == '\0') {
getaline(string, MAXLINE);
i = 0;
if (isInComment != INMULTICOMMENT)
isInComment = OUTOFCOMMENT;
}
When you start a new line, you initialize i to 0. But then in the next iteration:
for (i = 0; string[i] != EOF; ++i)
i will be incremented, so you'll begin the new line with index 1. Therefore there is a bug when the line begins with //.
You can see that it solves the problem if you write instead:
if (string[i] == '\0') {
getaline(string, MAXLINE);
i = 0;
if (isInComment != INMULTICOMMENT)
isInComment = OUTOFCOMMENT;
}
though it's usually considered as bad style to modify for loop indices inside the loop. You may redesign your implementation in a more readable way.

Printing C Array to a terminal

I need to read in a file from C, store it in an array and print its contents. For some reason I keep seeing octal in my output near the end. I am dynamically creating the array after counting how many lines and characters are in it after opening the file.
output:
Abies
abies
abietate
abietene
abietic
abietin
\320ΡΏ_\377Abietineae --> umlaut? where did he come from?
y\300_\377abietineous
code:
int main(int argc, char ** argv) {
char c = '\0';
FILE * file;
int i = 0, j = 0, max_line = 0, max_char_per_line = 0;
/* get array limits */
file = fopen(argv[1], "r");
while ((c = fgetc(file)) != EOF){
if (c == '\n'){
max_line++; j++;
if (j > max_char_per_line){
max_char_per_line = j;
}
j = 0;
continue;
}
j++;
}
rewind(file);
/* declare array dynamically based on max line and max char */
char word[max_line][max_char_per_line];
/*read in file*/
j = 0; c = '\0';
while ((c = fgetc(file)) != EOF){
if (c == '\n'){
word[i][j] = '\0';
i++; j=0;
continue;
}
word[i][j] = c;
j++;
}
word[i][j] = '\0';
fclose(file);
for (i = 0; i < max_line; i++){
printf("%s\n", word[i]);
}
return 0;
}
Change read routine:
if (c == '\n'){
word[i][j] = 0x0;
i++; j=0;
continue;
}
and add the "\n" back in the printf routine.
for (i = 0; i < max_line; i++){
printf("%s\n", word[i]);
}
C strings are zero-terminated, not "\n"-terminated, so when you printf()ed them, printf() did not know where to stop printing.
You aren't terminating your strings. You need to add the null-terminator: \0, after the last character for each line.
In your first loop, you determine enough space for the longest line, including a newline character.
If you want to keep the newlines in your input array, just add 1 to max_char_per_line, and add the null-terminator after the newline character when you finish each line in your second loop.
If you don't need the newline in your input array, instead simply use that space for the null-terminator.
Not that it explains exactly the phenomenon you observe, but it may. You do not seem to take into account the terminating zero byte when calculating array boundaries. Just ++ the max_char_per_line after doing the calculations. And don't forget to add this zero byte if the array isn't guaranteed to be zero-initialized.
edit: do you see these lines after the output or in one these lines of output?

C program to sort characters in a string

I have this program in C I wrote that reads a file line by line (each line has just one word), sorts the alphabets, and then displays the sorted word and the original word in each line.
#include<stdio.h>
int main()
{
char line[128];
int i=0;
int j;
int length;
while(fgets(line,sizeof line,stdin) != NULL)
{
char word[128];
for (i=0; line[i] != '\0'; i++)
{
word[i]=line[i];
}
while (line[i] != '\0')
i++;
length=i;
for (i=length-1; i >=0; i--)
{
for (j=0; j<i; j++)
{
if (line[j] > line[i])
{
char temp;
temp = line[j];
line[j] = line[i];
line[i]=temp;
}
}
}
printf("%s %s",line,word);
}
return 0;
}
I'm compiling and running it using the following bash commands.
gcc -o sign sign.c
./sign < sample_file | sort > output
The original file (sample_file) looks like this:
computer
test
file
stack
overflow
The output file is this:
ackst stack
cemoprtu computer
efil file
efloorvw overflow
er
estt test
ter
ter
I'm having two issues:
The output file has a bunch of newline characters at the beginning (ie. about 5-7 blank lines before the actual text begins)
Why does it print 'ter' twice at the end?
PS - I know these are very elementary questions, but I only just started working with C / bash for a class and I'm not sure where I am going wrong.
Problem 1
After this code, the variable line contains a line of text, including the newline character from the end of the string
while(fgets(line,sizeof line,stdin) != NULL)
{
This is why you are getting the "extra" newlines. The ASCII value for newline is less than the ASCII value for 'A'. That is why the newlines end up at the beginning of each string, once you've sorted the characters. E.g. "computer\n" becomes "\ncemoprtu".
To solve this, you can strip the newlines off the end of your strings, after the for-loop
if(i > 0 && word[i-1] == '\n')
{
word[i-1] = '\0';
line[i-1] = '\0';
--i;
}
...
printf("%s %s\n",line,word); /* notice the addition of the newline at the end */
This happens to solve problem 2, as well, but please read on, to see what was wrong.
Problem 2
After the loop
for (i=0; line[i] != '\0'; i++) { /* */ }
The string word will not be null-terminated (except by blind luck, since it is ready random uninitialized memory). This is why you get the "ter", because that is part of the data you left behind when you copied the word "computer" to word.
Problem 3
After the loop
for (i=0; line[i] != '\0'; i++) { /* */ }
The value of line[i] != '\0' will always be false. This means that this code will do nothing
while (line[i] != '\0')
i++;
It might make the problem more obvious if I replace the for-loop and the while-loop with basically identical code, using goto:
i=0;
begin_for_loop:
if(line[i] != '\0')
{
{
word[i]=line[i];
}
i++;
goto begin_for_loop;
}
begin_while_loop:
if(line[i] != '\0')
{
i++;
goto begin_while_loop;
}
(btw, most professional programmers will do anything from laugh to yell at you if you mention using goto :) I only am using it here to illustrate the point)
A tip I find handy is to draw out my arrays, variables etc on a piece of paper, then trace through each line of my code (again, on paper) to debug how it works.

Resources