Reading data in C - c

So my file has data of the form full_file_path # for example,
C:/dev/Java/src/java/util/concurrent/ConcurrentHashMap.java 212
C:/dev/Java/src/java/util/HashMap.java 212
C:/dev/Java/src/java/lang/CharacterData02.java 190
C:/dev/Java/src/java/lang/CharacterData0E.java 190
C:/dev/Java/src/java/nio/DirectCharBufferS.java 123
C:/dev/Java/src/java/nio/DirectCharBufferU.java 123
...
and I'm trying to read the file with
int dup;
char file[MAX_LINE];
...
FILE *fp;
fp = fopen("OUTPUT100.txt", "r");
while (fscanf(fp, "%s %d\n", &file, &dup) == 1) {
printf("%s %d\n", file, dup);
}
fclose(fp);
However the output is junk like
has 0 duplciate lines of code
RJ9 has 0 duplciate lines of code
has 0 duplciate lines of code
▒▒" has 0 duplciate lines of code
has 0 duplciate lines of code
"A▒ has 0 duplciate lines of code
has 0 duplciate lines of code
7▒cw has 0 duplciate lines of code
has 0 duplciate lines of code
has 0 duplciate lines of code
What am I doing wrong?
Edit: My "still spits out junk" comments were just brain farts. I had a printf in a loop below the while loop that was spitting out junk. After scrolling up a few hundred lines I started seeing sensible data. Commenting out the printf resulted in the expected results. The working read uses while (fscanf(fp, "%s %d\n", file, &dup) == 2). Thanks all.

Modify your loop call like this
while (fscanf(fp, "%s %d\n", file, &dup) == 2) {
As #mjswartz said file is an array that will decay to char* when you pass it to fscanf. Also fscanf will return the number of parameters it managed to retrieve and you are scanning 2 variables per line.

Well, fscanf returns the number of successfull format specifiers so it will return 2 on success, and you're checking if it returns 1. You probably want to check for the EOF character instead and stop reading from there.
Also the file variable is an array which can be used as a pointer so you shouldn't try to reference it.
while (1)
{
r = fscanf(fp, "%s %d\n", file, &dup);
if( r == EOF )
break;
printf("%s %d\n", file, dup);
}

regarding this line:
while (fscanf(fp, "%s %d\n", &file, &dup) == 1) {
the fscanf() returns the number of successful input/conversions
I.E. if it is successful, and the format string has 2 conversion parameters, then it will return 2, NOT 1.
Please read/understand the man page for fscanf()
The format string should NOT have a trailing '\n' (unlike printf() which almost always has a trailing '\n')
To consume any 'leftover' white space, like a newline, place a leading space in the format string.
The current format specifier '%s" does not have a max length modifier, so the data in the input file could overrun the available input buffer, resulting in undefined behaviour that can lead to a seg fault event.
Note: in C, the name of an array degrades to a pointer to the first byte of the array, so do not take the address of the array: file
suggest:
while (fscanf(fp, " %#s %d\n", file, &dup) == 2) {
where # is the value MAX_LINE-1.
There is absolutely no possibility that the posted code output the text that is posted as the output.
Suggest posting the actual code, the actual output and the expected output.
Note: file is a poor name for an array of char. suggest something meaningful, like buffer

Related

Read tab separated content line by line with last column empty string

I have a file format like this
1.9969199999999998 2.4613199999999997 130.81278270000001 AA
2.4613199999999997 2.5541999999999998 138.59131554109211 BB
2.5541999999999998 2.9953799999999995 146.83238401449094 CC
...........................
I have to read first three columns as float and the last column as char array in C. All the columns are tab separated and the there is an new line character at the end of each line. Everything works fine with fscanf(fp1, "%f\t%f\t%f\t%s\n", ...) till I have a some text at the end of each line (the char string part).
There are cases where instead of AA/BB/CC, I have an empty string in the file. How to handle that case. I have tried fscanf(fp1, "%f\t%f\t%f\t%s[^\n]\n", ...) and many other things, but I am unable to figure out the right way. Can you please help me out here?
Using float rather than double will throw away about half the digits shown. You get 6-7 decimal digits with float; you get 15+ digits with double.
As to your main question: use fgets() (or POSIX
getline()) to read lines and then sscanf() to parse the line that is read. This will avoid confusion. When the input is line-based but not regular enough, don't use fscanf() and family to read the data — the file-reading scanf() functions don't care about newlines, even when you do.
Note that sscanf() will return either 3 or 4, indicating whether there was a string at the end of a line or not (or EOF, 0, 1 or 2 if it is given an empty string, or a string which doesn't start with a number, or a string which only contains one or two numbers). Always test the return value from scanf() and friends — but do so carefully. Look for the number of values that you expect (3 or 4 in this example), rather than 'not EOF'.
This leads to roughly:
#include <stdio.h>
int main(void)
{
double d[3];
char text[20];
char line[4096];
while (fgets(line, sizeof(line), stdin) != 0)
{
int rc = sscanf(line, "%lf %lf %lf %19s", &d[0], &d[1], &d[2], &text[0]);
if (rc == 4)
printf("%13.6f %13.6f %13.6f [%s]\n", d[0], d[1], d[2], text);
else if (rc == 3)
printf("%13.6f %13.6f %13.6f -NA-\n", d[0], d[1], d[2]);
else
printf("Format error: return code %d\n", rc);
}
return 0;
}
If given this file as standard input:
1.9969199999999998 2.4613199999999997 130.81278270000001 AA
2.4613199999999997 2.5541999999999998 138.59131554109211 BB
2.5541999999999998 2.9953799999999995 146.83238401449094 CC
19.20212223242525 29.3031323334353637 3940.41424344454647
19.20212223242525 29.3031323334353637 3940.41424344454647 PolyVinyl-PolySaccharide
the output is:
1.996920 2.461320 130.812783 [AA]
2.461320 2.554200 138.591316 [BB]
2.554200 2.995380 146.832384 [CC]
19.202122 29.303132 3940.414243 -NA-
19.202122 29.303132 3940.414243 [PolyVinyl-PolySacch]
You can tweak the output format to suit yourself. Note that the %19s avoids buffer overflow even when the text is longer than 19 characters.

How to Read and Compare string in a file then get the number/s beside it in C

This is the Algorithm:
Open the file.
Read a string from the file.
Compare the string.
If the string matches, get the number/s after '=', then exit the loop.
Check the end of the file.
If it's not yet the end of the file, go back to step 2.
If the end of the file is met exit the loop, and go to step 5.
The string is not in the file.
This is the program i made:
FILE *fp = fopen("Converter.txt", "r");
char a;
while( fgets(line, sizeof(line), fp) != NULL ){
if(strcmp(line,a)){
printf("There is such string");
sscanf(line,"%*[^=]=%f", &num);
printf("\n\n%.2f",num);
}else{
printf("NULL\n");
continue;
}break;
}
inside of file:
Inch to Meter = 0.0254
Foot to Centimeter = 30.84
Foot to Inch = 12.00
and many more.
Input:
Foot to Centimeter
Output:
There is such string
30.84
The problem is it only scan the first line and it doesn't scan the number beside the string that it compares.
Thanks.
The easy way to do this is to let sscanf do the string comparison for you:
while(fgets(line, sizeof(line), fp)) {
double num;
if (sscanf(line, " string =%lf", &num) == 1) {
printf("Found line with 'string' and %f\n", num);
break; } }
sscanf will match the string string on the input line, followed by an = and a number. Note also the spaces in the format string -- these will match any amount of whitespace, so this will match all of the following lines:
string=1
string = 2
string =4
Of course, it will also match lines like:
string=5with extra stuff on the end...
and ignore everything after the number.
Just use the appropriate function to check that the line contains the string, instead of
if (strcmp(line,a))
use
if (strstr(line, a) != NULL)
you should know that strcmp() will not do a partial matching and that it actually returns 0 when there is a match, so in your case the condition is always being true, and since you don't check the return value of scanf() when it fails num will not be modified, leading either to undefined behavior, or to printing the previous value depending on whether there was a match before.
Do check the return value from sscanf() since ignoring it may cause problems.
And don't forget to check that the file was actually opened by fopen(), it returns NULL on failure.

Variable does not increment in loop that reads file

My code reads input in the following format:
The first line has just a number
The other lines have a number and 4 strings
The first line tells the number of following lines.
After reading the file, I want to verify if the number of lines read is the same as specified in the first line. In order to achieve it, I am trying to use a variable count_lines, incrementing it at each iteration of the while loop.
FILE *fp;
char line[MAXLINELEN];
int count_lines = 0;
char city[50], continent[13], cultural[1], outdoors[1];
int total_lines, id;
...
while(fgets(line, sizeof(line), fp))
{
if (count_lines == 0)
{
sscanf(line, "%d", &total_lines);
nodes2 = calloc(sizeof(node), total_lines);
}
else if (sscanf(line, "%d %s %s %s %s", ...)
{
/* code (previously some malloc and strcopy stuff, but the error occurs even without this part of the code) */
}
else
{
/* code */
}
count_lines++;
printf("point \n");
printf("%d\n", count_lines);
}
Data example:
-bash-4.1$ cat places
3
1 City1 Continent1 Y Y
2 City2 Continent1 Y Y
3 City3 Continent1 Y N
However, this is the output of running the code:
point
1
point
1
point
1
point
1
point
1
point
1
point
1
point
1
point
1
point
1
point
1
point
1
I verified that the problem has to do with the else if part. If I comment this part, the counting works correctly. However, I could not figure out why this is happening.
What's wrong with the code?
Note: As this is part of an assignment, I cannot post the whole code.
I omitted irrelevant parts with a /* code */ comment.
strcpy copies second argumet to first argument. In your code you seem to assume the reverse
char *cont_temp = malloc(strlen(continent) + 1);
strcpy(city, cont_temp);
In this case you are copying from cont_temp to city. But cont_temp at this point contains garbage, while city contains the data you just read from file. That's one of the problems. Apparently it should be
strcpy(cont_temp, city);
shouldn't it?
However, I don't understand why you use strlen(continent) to determine the memory size, but then suddenly switch to working with city. That's another problem.
The problem is that cultural and outdoors have size 1, so there's no enough space for the new line character. Defining them with size 2 solves the problem.

Reading a specific line from a file in C

Okay, so after reading both: How to read a specific line in a text file in C (integers) and What is the easiest way to count the newlines in an ASCII file? I figured that I could use the points mentioned in both to both efficiently and quickly read a single line from a file.
Here's the code I have:
char buf[BUFSIZ];
intmax_t lines = 2; // when set to zero, reads two extra lines.
FILE *fp = fopen(filename, "r");
while ((fscanf(fp, "%*[^\n]"), fscanf(fp, "%*c")) != EOF)
{
/* globals.lines_to_feed__queue is the line that we _do_ want to print,
that is we want to ignore all lines up to that point:
feeding them into "nothingness" */
if (lines == globals.lines_to_feed__queue)
{
fgets(buf, sizeof buf, fp);
}
++lines;
}
fprintf(stdout, "%s", buf);
fclose(fp);
Now the above code works wonderfully, and I'm extrememly pleased with myself for figuring out that you can fscanf a file up to a certain point, and then use fgets to read whatever data is at said point into a buffer, instead of having to fgets every single line and then fprintf the buf, when all I care about is the line that I'm printing: I don't want to be storing strings that I could care less about in a buffer that I'm only going to use once for a single line.
However, the only issue I've run into, as noted by the // when set to zero, reads two extra lines comment: when lines is initialized with a value of 0, and the line I want is like 200, the line I'll get will actually be line 202. Could someone please explain what I'm doing wrong here/why this is happening and whether my quick fix lines = 2; is fine or if it is insufficient (as in, is something really wrong going on here, and it just happens to work?)
There are two reasons why you have to set the lines to 2, and both can be derived from the special case where you want the first line.
On one hand, in the while loop the first thing you do is use fscanf to consume a line, then you check if the lines counter matches the line you want. The thing is that if the line you want is the one you just consumed you are out of luck. On the other hand you are basically moving through lines by finding the next \n and incrementing lines after you check if the current line is the one you're after.
These two factors combined cause the offset in the lines count, so the following is a version of the same function taking them into account. Additionally it also contains a break; statement once you get to the line you are looking for, so that the while loop stops looking further into the file.
void read_and_print_line(char * filename, int line) {
char buf[BUFFERSIZE];
int lines = 0;
FILE *fp = fopen(filename, "r");
do
{
if (++lines == line) {
fgets(buf, sizeof buf, fp);
break;
}
}while((fscanf(fp, "%*[^\n]"), fscanf(fp, "%*c")) != EOF);
if(lines == line)
printf("%s", buf);
fclose(fp);
}
Just as another way of looking at the problem… Assuming that your global specifies 1 when the first line is to be printed, 2 for the second, etc, then:
char buf[BUFSIZ];
FILE *fp = fopen(filename, "r");
if (fp == 0)
return; // Error exit — report error.
for (int lineno = 1; lineno < globals.lines_to_feed_queue; lineno++)
{
fscanf(fp, "%*[^\n]");
if (fscanf(fp, "%*c") == EOF)
break;
}
if (fgets(buf, sizeof(buf), fp) != 0)
fprintf(stdout, "%s", buf);
else
…requested line not present in file…
fclose(fp);
You could replace the break with fclose(fp); and return; if that's appropriate (but do make sure you close the file before exiting; otherwise, you leak resources).
If your line numbers are counted from 0, then change the lower limit of the for loop to 0.
First, about what is wrong here: this code is unable to read the very first line in the file (what happens if globals.lines_to_feed__queue is 0?). It would also miscount lines shall the file contain successive newlines.
Second, you must realize that there is no magic. Since you don't know at which offset the string in question lives, you have to patiently read file character by character, counting end-of-strings along the way. It doesn't matter if you delegate the reading/counting to fgets/fscanf, or fgetc each character for manual inspection - either way an uninteresting piece of file will make its way from the disk into the OS buffers, and then into the userspace for interpretation.
Your gut feeling is absolutely correct: the code is broken.

Program not working for large files in C

I'm using the following program in C to filter a log file with about 200,000 lines. But the program stops responding after about 12000 lines. Any explanations why does this happen and any solution to it?
The code is compiled in GCC (windows).
PS: The code is executing properly and giving desired output for small files.
#include<stdio.h>
#include<string.h>
int check(char *url)
{
//some code to filter the data and return either 0 or 1 depending upon input
}
int main()
{
FILE *fpi, *fpo;
fpi=fopen("access.log","r");
fpo=fopen("edited\\filter.txt","w");
char date[11],time[9],ip[16],url[500],temp[3];
while(!feof(fpi))
{
printf(".");
fscanf(fpi," %s %s %s %s %s %s",date,time,temp,ip,temp,url);
if(check(url))
fprintf(fpo,"%s %s %s %s %s %s\n",date,time,temp,ip,temp,url);
}
fclose(fpi);
fclose(fpo);
printf("\n\n\nDONE! :)");
return 0;
}
It is possible that one of the lines in the input file contains a field that is larger than the string variable you pass to fscanf(). It might result in a buffer overflow, which later results in an infinite loop somewhere. Just a speculation. I suggest you delimit %s in the fscanf() format string with the maximum length of the output string variable.
For example, this will make sure that there are no buffer overflows and that the resulted strings terminate:
fscanf(fpi," %10s %8s %2s %15s %49s %2s", date, time, temp, ip, temp, url);
date[10] = '\0';
time[8] = '\0';
ip[15] = '\0';
temp[2] = '\0';
url[499] = '\0';
Also, you are reading temp twice. The latter read will override the former. Is this what you intended?
Another improvement, assuming that the input file is line-terminated, and each log is in a separate line, is to use fgets() in order to read a line and only then use sscanf() on the intermediate buffer. This way you ensure that no formatting errors extend beyond a single line. Also, sscanf returns the number of read items, in your case - 6. It's would be safer to check the return value.

Resources