C fscanf unexpectedly pulling content from next line - c

EDIT: OK I found the embarrassing mistake. My description array wasn't big enough for that line...
I'm trying to extract related abbreviations and descriptions from a simply formatted text file, but I'm running into a problem on just a single line while every other line works fine.
In the text file I'm reading from, lines 5-7 are:
5. FW Foreign word
6. IN Preposition or subordinating conjunction
7. JJ Adjective
What I'm trying to do with each line is read the abbreviation and store it as a character array and then do the same with the description. For every line except for #6, it works fine.
What I expect is:
print decription[line6] => "Preposition or subordinating conjunction"
But what I get is:
print decription[line6] => "Preposition or subordinating conjunctio"Adjective"
I'm pretty lost as to why it might be doing that. It seems to be reading data from the next line. Or maybe I ended up overwriting the next line into that array.
#include <stdio.h>
int main(){
FILE *fileToRead = fopen("PennTreebank_POS_Tags.txt", "r");
FILE *fileToWrite = fopen("newFile.txt", "w");
int i, j;
i = j = 0;
int nextChar;
char abbreviation[50][5];
char description[50][40];
while( fscanf(fileToRead, "%*s %s ", abbreviation[i]) != EOF ){
description[i][0] = '"';
while( ((nextChar = fgetc(fileToRead)) != '\n') && (nextChar != EOF) ){
description[i][j] = nextChar;
j++;
}
description[i][j] = '"';
description[i][j+1] = '\0';
j=1;
i++;
}
for( i=0; i<36; i++ ){
printf("%s %s\n", abbreviation[i], description[i]);
}
}

The text "Preposition or subordinating conjunction" has a length of 40 chars.
So the array
char description[50][40];
will not be enough size to hold the length 40 plus the 0 terminator.

Related

Convert a column in CSV to a string in C

So, I was trying to write a program to take the first column of a CSV file and copy that do a string, but it's not going well. I put the code and the CSV file and I really appreciate any help.
int main () {
FILE *fp;
fp = fopen("filePATH", "r");
char column[80];
int line_n = 0;
char ch;
while ((ch = fgetc(fp)) != EOF) {
fgets(column, sizeof column, fp);
for (int i = 0; i < sizeof column; ++i){
fscanf(fp, "%[^;]", column);
}
printf("%s \n", column);
}
fclose(fp);
return 0;
}
CSV file:
2;51.5;144.0;24.80
5;62.3;157.0;25.30
10;52.8;141.0;26.60
10;34.5;120.0;24.00
1;41.6;131.0;24.20
5;49.0;144.0;23.80
6;47.1;142.0;23.50
2;51.8;144.5;24.80
1;55.6;135.0;30.50
9;51.9;150.0;23.10
9;48.5;139.0;25.10
The output I have is:
5
10
10
1
5
6
2
1
9
9
48.5;139.0;25.10
So, I don't understand why the program shows me the first column but copies only the last line for the string column.
To check the string column, I used:
char copy[20];
strncpy(copy, column, 18);
printf("%s ", copy);
And the output is:
48.5;139.0;25.10
Almost certainly you want to restructure the loops, but here are some minimal changes to your code that might be instructive:
#include<stdio.h>
char sample_input[] =
"2;51.5;144.0;24.80\n"
"5;62.3;157.0;25.30\n"
"10;52.8;141.0;26.60\n"
"10;34.5;120.0;24.00\n"
"1;41.6;131.0;24.20\n"
"5;49.0;144.0;23.80\n"
"6;47.1;142.0;23.50\n"
"2;51.8;144.5;24.80\n"
"1;55.6;135.0;30.50\n"
"9;51.9;150.0;23.10\n"
"9;48.5;139.0;25.10\n"
;
int
main(void)
{
FILE *fp = fmemopen(sample_input, sizeof sample_input, "r");
char column[80];
int line_n = 0;
int ch; /* fgetc returns an int. */
while( (ch = fgetc(fp)) != EOF ){
/* put ch back so it can be read by scanf */
ungetc(ch, fp);
/* This fgets does not seem to serve any purpose.
/* fgets(column, sizeof column, fp); */
ch = ';';
while( ch == ';' && fscanf(fp, "%79[^;\n]", column) == 1 ){
ch = fgetc(fp); /* Consume the ; or \n */
printf("%s%c", column, ch);
}
}
fclose(fp);
return 0;
}
It would probably be better to use the fgets to read each line of data and then use sscanf to parse the line, but I wanted to show that your current code can be made to (mostly) work with minimal changes. Note that the variable used to store the value returned by fgetc must be int rather than char in order to properly compare to EOF. Also, whenever you use %s or %[] in scanf, it is best to add a width modifier to prevent a buffer overflow.

Why is my character count incorrect?

The following code gets the number of words:
int count = 0;
for (int i = 0; chars[i] != EOF; i++)
{
if (chars[i] == ' ')
{
count++;
}
}
My problem is, that it doesn't count the words correctly.
For example, if my file.txt has the following text in it:
spaced-out there's I'd like
It says I have 6 words, when according to MS Word I'd have 4.
spaced-out and in
Gives me a word count of 4.
spaced out and in
Gives me a word count of 6
I'm sorry if this question has been answered before, Google doesn't take into account the special characters in the search, so it is hard to find the answer to coding. I'd preferably have the words just by identifying if it's a space or not.
I tried looking for answers but no one seemed to have the same problem exactly. I know that the .txt files might end in /r/n in Windows, but then that should be part of one word. For example:
spaced out and in/r/n
I believe it should still give me 4 words. Also when I add || chars[i] == '\n' as:
for (int i = 0; chars[i] != EOF || chars[i] == '\n'; i++)
I get even more words, 8 for the line
spaced out and in
I am doing this on a Linux-based server, but on an SSH client on Windows. The characters come from a .txt file.
Edit: Okay, here is the code, I avoided the #include when posting it.
#define BUF_SIZE 500
#define OUTPUT_MODE 0700
int main(int argc, char *argv[])
{
int input, output;
int readSize = 1, writeSize;
char chars[BUF_SIZE];
int count = 0;
input = open(argv[1], O_RDONLY);
output = creat(argv[2], OUTPUT_MODE);
while (readSize > 0)
{
readSize = read(input, chars, BUF_SIZE);
if (readSize < 0)
exit(4);
for (int i = 0; chars[i] != '\0'; i++)
{
if (chars[i] == ' ')
{
count++;
}
}
writeSize = write(output, chars, readSize);
if (writeSize <= 0)
{
close(input);
close(output);
printf("%d words\n", count);
exit(5);
}
}
}
I am writing this answer because I think, I know what your confusion is. But note that you did not explain how you read the file, I'll give an example and explain why we test != EOF, which is not a character that you read from a file.
It appears that you think EOF is a character that is stored in the file, well it's not. If you just want to count words you can do something like
int chr;
while ((chr = fgetc(file)) != EOF)
count += (chr == ' ') ? 1 : 0;
note that chr MUST be of type int because EOF is of type int, but it's certainly not present in the file! It's returned by functions like fgetc() to indicate that there is nothing more to read, note that an attempt to read must be made in order for it to return it.
Oops, also note that my sample code will not count the last word. But that's for you to figure out.
Also, this would count multiple spaces as "words" something that you should also workout.

errors parsing int from text file c

Heres' my code so far, basically I am reading in a text file, and trying to save the each line by line input to a character array. The text file is read properly and saved to a character array which I then traverse through, and attempt to save the digits to an int.
*This isn't the entire code piece, as it's an assignment that I'm working on. The current code provided is simply for debugging purposes.
//takes input of file, saves it to array
char word[20];
scanf("%s", word);
//File to open
char line[10];
char number[10];
FILE *file;
file = fopen(word, "r");
if (file) {
while (fgets(line, sizeof(line), file)) {
printf("%s", line);
int i,j=0;
int parsedInt;
for(i=2; i<sizeof(line) && !isspace(line[i]); i++)
{
number[j] = line[i];
j++;
}
sscanf(number, "%d", &parsedInt);
printf("PARSED INT %d \n\n", parsedInt);
parsedInt = 0;
Here's a sample input file, I have handled the i and d, which works fine.
i 10
i 12
d 10
i 3
and heres sample output with those numbers
i 10
PARSED INT 10
i 12
PARSED INT 12
d 10
PARSED INT 10
i 3
PARSED INT 32
Can someone explain why the last input gives a 32 instead of a 3 while the others are properly done?
Since we're not seeing everything, I'm going to take an educated guess that you're not clearing the number array before putting in values, so scanf() is going past your actual input. Either put a '\0' after your for loop or use memset() or similar to clear the array before use. So like:
memset(number, 0, sizeof(number));
for(i=2; i<sizeof(line) && !isspace(line[i]); i++) {
number[j] = line[i];
j++;
}
This assumes that '\0' == 0 (which is a damn safe assumption IME).

Removing punctuation and capitalizing in C

I'm writing a program for school that asks to read text from a file, capitalizes everything, and removes the punctuation and spaces. The file "Congress.txt" contains
(Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the government for a redress of grievances.)
It reads in correctly but what I have so far to remove the punctuation, spaces, and capitalize causes some major problems with junk characters. My code so far is:
void processFile(char line[]) {
FILE *fp;
int i = 0;
char c;
if (!(fp = fopen("congress.txt", "r"))) {
printf("File could not be opened for input.\n");
exit(1);
}
line[i] = '\0';
fseek(fp, 0, SEEK_END);
fseek(fp, 0, SEEK_SET);
for (i = 0; i < MAX; ++i) {
fscanf(fp, "%c", &line[i]);
if (line[i] == ' ')
i++;
else if (ispunct((unsigned char)line[i]))
i++;
else if (islower((unsigned char)line[i])) {
line[i] = toupper((unsigned char)line[i]);
i++;
}
printf("%c", line[i]);
fprintf(csis, "%c", line[i]);
}
fclose(fp);
}
I don't know if it's an issue but I have MAX defined as 272 because that's what the text file is including punctuation and spaces.
My output I am getting is:
C╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠
╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠Press any key to continue . . .
The fundamental algorithm needs to be along the lines of:
while next character is not EOF
if it is alphabetic
save the upper case version of it in the string
null terminate the string
which translates into C as:
int c;
int i = 0;
while ((c = getc(fp)) != EOF)
{
if (isalpha(c))
line[i++] = toupper(c);
}
line[i] = '\0';
This code doesn't need the (unsigned char) cast with the functions from <ctype.h> because c is guaranteed to contain either EOF (in which case it doesn't get into the body of the loop) or the value of a character converted to unsigned char anyway. You only have to worry about the cast when you use char c (as in the code in the question) and try to write toupper(c) or isalpha(c). The problem is that plain char can be a signed type, so some characters, notoriously ÿ (y-umlaut, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS), will appear as a negative value, and that breaks the requirements on the inputs to the <ctype.h> functions. This code will attempt to case-convert characters that are already upper-case, but that's probably cheaper than a second test.
What else you do in the way of printing, etc is up to you. The csis file stream is a global scope variable; that's a bit (tr)icky. You should probably terminate the output printing with a newline.
The code shown is vulnerable to buffer overflow. If the length of line is MAX, then you can modify the loop condition to:
while (i < MAX - 1 && (c = getc(fp)) != EOF)
If, as would be a better design, you change the function signature to:
void processFile(int size, char line[]) {
and assert that the size is strictly positive:
assert(size > 0);
and then the loop condition changes to:
while (i < size - 1 && (c = getc(fp)) != EOF)
Obviously, you change the call too:
char line[4096];
processFile(sizeof(line), line);
in the posted code, there is no intermediate processing,
so the following code ignores the 'line[]' input parameter
void processFile()
{
FILE *fp = NULL;
if (!(fp = fopen("congress.txt", "r")))
{
printf("File could not be opened for input.\n");
exit(1);
}
// implied else, fopen successful
unsigned int c; // must be integer so EOF (-1) can be recognized
while( EOF != (c =(unsigned)fgetc(fp) ) )
{
if( (isalpha(c) || isblank(c) ) && !ispunct(c) ) // a...z or A...Z or space
{
// note toupper has no effect on upper case characters
// note toupper has no effect on a space
printf("%c", toupper(c));
fprintf(csis, "%c", toupper(c));
}
}
printf( "\n" );
fclose(fp);
} // end function: processFile
Okay so what I did was created a second character array. My first array read in the entire file. I created a second array which would only take in alphabetical characters from the first array then make them uppercase. My correct and completed function for that part of my homework is as follows:
void processFile(char line[], char newline[]) {
FILE *fp;
int i = 0;
int j = 0;
if (!(fp = fopen("congress.txt", "r"))) { //checks file open
printf("File could not be opened for input.\n");
exit(1);
}
line[i] = '\0';
fseek(fp, 0, SEEK_END); //idk what they do but they make it not crash
fseek(fp, 0, SEEK_SET);
for (i = 0; i < MAX; ++i) { //reads the file into the first array
fscanf(fp, "%c", &line[i]);
}
for (i = 0; i < MAX; ++i) {
if (isalpha(line[i])){ //if it's an alphabetical character
newline[j] = line[i]; //read into new array
newline[j] = toupper(newline[j]); //makes that letter capitalized
j++;
}
}
fclose(fp);
}
Just make sure that after creating the new array, it will be smaller than your defined MAX. To make it easy I just counted the now missing punctuation and spaces (which was 50) so for future "for" loops it was:
for (i = 0; i < MAX - 50; ++i)

C - Can't get the number of lines in text file? Are there other ways to get?

I want program count lines in text file by function. It used to work ,but it always return 0 now.
What am I doing wrong?
#include <stdio.h>
int couLineF(FILE* fp){ //count lines in file
int count = 0,ch;
while((ch = fgetc(fp)) != EOF){
if(ch == (int)"\n" ) count++;
}
rewind(fp);
return count;
}
int main(){
FILE *fp = fopen("book.txt","r");
int lines;
if(fp){
lines = couLineF(fp);
printf("number of lines is : %d",lines);
}
return 0;
}
Another question
Are there any other ways to get number of lines in text file?
Your problem is here:
if(ch == (int)"\n" )
You are casting the address of "\n", a string literal, into an int and comparing it with ch. This doesn't make any sense.
Replace it with
if(ch == '\n' )
to fix it. This checks if ch is a newline character.(Use single quotes(') for denoting a character and double quotes(") for a string)
Other problems are:
Not closing the file using fclose if fopen was successful.
Your program won't count the last line if it doesn't end with \n.
There is absolutely no reason to use rewind(fp) as you never use the FILE pointer again.

Resources