Here's my code:
FILE* fp,*check;
fp=fopen("file.txt","r");
check=fp;
char polyStr[10];
while(fgetc(check)!='\n')
{
fscanf(fp,"%s",polyStr);
puts(polyStr);
check=fp;
}
while(fgetc(check)!=EOF)
{
fscanf(fp,"%s",polyStr);
puts(polyStr);
check=fp;
}
Now if my file.txt is:
3,3, 4,4, 5,5
4,1, 5,5, 12,2
Now output is:
,3,
4,4,
5,5,
,1,
5,5,
12,2,
Now why is the first character of both the lines not getting read?
Your fgetc call is eating the character.
You should read entire lines with fgets and then parse them with the strtol family. You should never use any of the *scanf functions.
Let's talk about the format of the input data first. Your list would seem to be better formatted if you only had <coef>,<exp> without the trailing comma. In this way, you would have a nice pattern with which to match. So you could do something like:
fscanf(filep, "%d,%d", &coef, &exp)
to get the values. You should check the return value from fscanf to be sure that you are reading 2 fields. So if the format of a line was a set of the following '<coef>,<exp><white-space>' (where white-space is either one blank or one newline, then you would be able to do the following:
do {
fscanf(filep, "%d,%d", &coef, &exp);
} while (fgetc(filep) != '\n');
This code allows you to get the pairs until you eat the end of line. The while conditional will eat either the blank or the newline. You can wrap this in another loop for processing several lines.
Note that I have NOT tested this code, but the gist of it should be clear. Comment if you have any more questions.
Related
I was wondering if it is possible to only read in particular parts of a string using scanf.
For example since I am reading from a file i use fscanf
if I wanted to read name and number (where number is the 111-2222) when they are in a string such as:
Bob Hardy:sometext:111-2222:sometext:sometext
I use this but its not working:
(fscanf(read, "%23[^:] %27[^:] %10[^:] %27[^:] %d\n", name,var1, number, var2, var3))
Your initial format string fails because it does not consume the : delimiters.
If you want scanf() to read a portion of the input, but you don't care what is actually read, then you should use a field descriptor with the assignment-suppression flag (*):
char nl;
fscanf(read, "%23[^:]:%*[^:]:%10[^:]%*[^\n]%c", name, number, &nl);
As a bonus, you don't need to worry about buffer overruns for fields with assignment suppressed.
You should not attempt to match a single newline via a trailing newline character in the format, because a literal newline (or space or tab) in the format will match any run of whitespace. In this particular case, it would consume not just the line terminator but also any leading whitespace on the next line.
The last field is not suppressed, even though it will almost always receive a newline, because that way you can tell from the return value if you've scanned the last line of the file and it is not newline-terminated.
Check fscanf() return value.
fscanf(read, "%23[^:] %27[^:] ... is failing because after scanning the first field with %23[^:], fscanf() encounters a ':'. Since that does not match the next part of the format, a white-space as in ' ', scanning stops.
Had code checked the returned value of fscanf(), which was certainly 1, it may have been self-evident the source of the problem. So the scanning needs to consume the ':', add it to the format: "%23[^:]: %27[^:]: ...
Better to use fgets()
Using fscanf() to read data and detect properly and improperly formatted data is very challenging. It can be done correctly to scan expected input. Yet it rarely works to handle some incorrectly formated input.
Instead, simple read a line of data and then parse it. Using '%n' is an easy way to detect complete conversion as it saves the char scan count - if scanning gets there.
char buffer[200];
if (fgets(buffer, sizeof buffer, read) == NULL) {
return EOF;
}
int n = 0;
sscanf(buffer, " %23[^:]: %27[^:]: %10[^:]: %27[^:]:%d %n",
name, var1, number, var2, &var3, &n);
if (n == 0) {
return FAIL; // scan incomplete
}
if (buffer[n]) {
return FAIL; // Extra data on line
}
// Success!
Note: sample input ended with text, but original format used "%d". Unclear on OP's intent.
I ran into a problem today. I can't find a way to check if a line in a file is over and the words are read from the next one already. I read word by word from the file using fscanf, then process the word as I need to and print it out into another file but there is a problem.
for example my data file is:
Hello, how are you
doing?
and the result file shows:
Hello, how are you doing?
but i need the words to be in the same lines from which I took them. Please keep in mind that I need those words one by one, that is why I don't use getline()
here is my code of how I read words from the file:
while( fscanf(file, "%s", A) != EOF )
{
check(A, B, &a); // I edit the words and put them in B string
// which is printed to the write file
}
Thank you for any tips!
Read the line into a string with getline() or fgets(), then use sscanf to get the words out of this string.
You can use a simple logic instead, like matching strings like . or ? which generally ends lines.
You need to check for end of line by adding check.
As the end-of-line is represented by the newline character, which is '\n'. so in while loop instead of copying entire thing do it line by line with the help of check for '\n'
I have a tab delimited file that I am trying to convert to a tab delimited file. I am using C. I am getting stuck on trying to read the second line of the file. Now I just have an tens of thousand of lines repeating the first line.
#include <stdio.h>
#include <string.h>
#define SELLERCODE A2LQ9QFN82X636
int main ()
{
typedef char* string;
FILE* stream;
FILE* output;
string asin[200];
string sku[15];
string fnsku[15];
int quality = 0;
stream = fopen("c:\\out\\a.txt", "r");
output = fopen("c:\\out\\output.txt", "w");
if (stream == NULL)
{
perror("open");
return 0;
}
for(;;)
{
fscanf(stream, "%[^\t]\t%[^\t]", sku, fnsku);
printf("%s\t%s\n", sku, fnsku);
fprintf(output, "%s\t%s\t%\t%s\t%s\t%i\n", sku, fnsku, asin, quality);
}
}
Prefer fgets() to read the input and parse the lines in your program, using, for example, sscanf() or strtok().
fscanf is notoriously difficult to use.
Your fscanf is not performing any conversions after the first line.
It reads characters up to a TAB, then ignores the TAB, and reads more characters up to the next TAB. On the 2nd time through the loop, there is no data for sku: the 1st character is a TAB.
Do check the return value though. It helps enormously.
chk = fscanf(stream, "%[^\t]\t%[^\t]", sku, fnsku);
/* 2 conversions: sku and fnsku */
if (chk != 2) {
/* something went wrong */
}
You are reading with
fscanf(stream, "%[^\t]\t%[^\t]", sku, fnsku);
After the first line is read, which should ends with a tab character (as in "%[^\t]\t%[^\t]"). The input buffer has the last tab character '\t' which is not read by the above function call. So in the next iteration it gets read at the beginning with your format string. But the fcanf in the next iteration immediately returns as it has encountered a tab character '\t' at the very beginning ("%[^\t]") , so the buffers still have the last read in value. From now on each iteration tries to read the file with the fscanf but fails every time encountering a '\t' at the very beginning. So you do not progress reading the file, and the first read values from your program buffers are shown on and on.
You need to read out the last character which terminated the scanset matching. You can either use a fgetc (stream) after the fscanf () call or use the following format string: "%[^\t]\t%[^\t]%*c" . The %*c is the assignment suppression syntax. This will make one character read from the input file but then discard it.
Also you should check what the fscanf () returns. If it does not return 2 (the number of elements to read) then there is a problem which you should handle. This way you can ensure the correct number of elements were read at one call.
So either you can do:
while (!feof (stream))
{
fscanf(stream, "%[^\t]\t%[^\t]", sku, fnsku);
fgetc (stream);
printf("%s\t%s\n", sku, fnsku);
fprintf(output, "%s\t%s\t%\t%s\t%s\t%i\n", sku, fnsku, asin, quality);
}
Or you can do:
while (!feof (stream))
{
fscanf(stream, "%[^\t]\t%[^\t]%*c", sku, fnsku);
printf("%s\t%s\n", sku, fnsku);
fprintf(output, "%s\t%s\t%\t%s\t%s\t%i\n", sku, fnsku, asin, quality);
}
But i will recommend to read it with fgets () and then parse it inside your program with strtok () or other means and ways.
EDIT1:
Note that if you have the original file terminated with a '\n' then after you read the lines as above an extra newline would be added into your buffers. If you still consider to directly read the fields with fscanf () where each line has multiple fields seperated with '\t' and an entry is terminated with a '\n' then you should use the following format string: "%[^\t]\t%[^\t]\n".
It is difficult to answer while we do not get the exact format of the file. Does the file contain only one single line with fields seperated with tabs? Or there are multiple lines, with each line having tab separated fields. If the later is true, best is to scan the whole line at once and then parse it internally.
Ok, here is what is actually happening. You are reading the first line, and from then on you aren't reading anything and just reusing those values. You should check the return value of fscanf and exit the loop if it is less than two (which it will be after the first iteration). Your fscanf line should look like this:
if( fscanf(stream, "%[^\t]\t%[^\t]\n", sku, fnsku) < 2 ) break;
The key is the newline at the end, which will eat the newline in the input.
There are some problems with your printf as well. (Incorrect number of formatting strings.) I'll leave that to you.
I have been struggling to figure out the fscanf formatting. I just want to read in a file of words delimited by spaces. And I want to discard any strings that contain non-alphabetic characters.
char temp_text[100];
while(fscanf(fcorpus, "%101[a-zA-Z]s", temp_text) == 1) {
printf("%s\n", temp_text);
}
I've tried the above code both with and without the 's'. I read in another stackoverflow thread that the s when used like that will be interpreted as a literal 's' and not as a string. Either way - when I include the s and when I do not include the s - I can only get the first word from the file I am reading through to print out.
The %[ scan specifier does not skip leading spaces. Either add a space before it or at the end in place of your s. Also you have your 100 and 101 backwards and thus a serious buffer overflow bug.
The s isn't needed.
Here are a few things to try:
Print out the return value from fscanf, and make sure it is 1.
Make sure that the fscanf is consuming the whitespace by using fgetc to get the next character and printing it out.
I am attempting to parse a text (CSS) file using fscanf and pull out all statements that match this pattern:
#import "some/file/somewhere.css";
To do this, I have the following loop set up:
FILE *file = fopen(pathToSomeFile, "r");
char *buffer = (char *)malloc(sizeof(char) * 9000);
while(!feof(file))
{
// %*[^#] : Read and discard all characters up to a '#'
// %8999[^;] : Read up to 8999 characters starting at '#' to a ';'.
if(fscanf(file, "%*[^#] %8999[^;]", buffer) == 1)
{
// Do stuff with the matching characters here.
// This code is long and not relevant to the question.
}
}
This works perfectly SO LONG AS the VERY FIRST character in the file is not a '#'. (Literally, a single space before the first '#' character in the CSS file will make the code run fine.)
But if the very first character in the CSS file is a '#', then what I see in the debugger is an infinite loop -- execution enters the while loop, hits the fscanf statement, but does not enter the 'if' statement (fscanf fails), and then continues through the loop forever.
I believe my fscanf formatters may need some tweaking, but am unsure how to proceed. Any suggestions or explanations for why this is happening?
Thank you.
I'm not an expert on scanf pattern syntax, but my interpretation of yours is:
Match a non-empty sequence of non-'#' characters, then
Match a non-empty sequence of up to 8999 non-';' characters
So yes, if your string starts with a '#', then the first part will fail.
I think if you start your format string with some whitespace, then fscanf will eat any leading whitespace in your data string, i.e. simply " %8999[^;]".
Oli already said why fscanf failed. And since failure is a normal state for fscanf your busy loop is not the consequence of the fscanf failure but of the missing handling for it.
You have to handle a fscanf failure even if your format would be correct (in your special case), because you cannot be sure that the input always is matchable by the format. Actually you can be sure that much more nonmatching input exists than matching input.
Your format string does the following actions:
Read (and discard) 1 or more non-# characters
Read (and discard) 0 or more whitespace characters (due to the space in the format string)
Read and store 1 to 8999 non-; characters
Unfortunately, there is no format specifier for reading "zero or more" characters from a user-defined set.
If you don't care about multiple #include statements on a line, you could change your code to read a single line (with fgets), and then extract the #include statement from that (if the first character does not equal #, you can use your current format string with sscanf, otherwise, you could use sscanf(line, "%8999[^;]", buffer)).
If multiple #include statemens on a line should be handled correctly, you could inspect the next character to be read with getc and then put it back with ungetc.