Read space on fscanf - c

I have a field that allows whitespace in my text file that is 'citepage'. Is it possible for fscanf to read the field with blanks between tabs and then show it in printf? The citepage is getting data of timestamp.
Exemple .txt:
1[tab]AAAI[tab]Low-cost Outdoor Robot Platform for the Penn State Abington Mini Grand Challenge[tab]2005[tab]Robert Avanzato[tab]1[tab][espaco][tab]2013-03-07 16:49:1
My current code:
while (!feof(fp)){
fscanf(fp,"%d\t %19[^\t]\t %300[^\t]\t %d\t %100[^\t]\t %d\t %s\t %19[^\t]\n ",&artigos.id,artigos.sigla,artigos.titulo,&artigos.ano,artigos.autores,&artigos.citacoes,artigos.citepage,artigos.timestamp);
printf("\nid: %d ",artigos.id);
printf("\nsigla: %s ",artigos.sigla);
printf("\ntitulo: %s ",artigos.titulo);
printf("\nano: %d ",artigos.ano);
printf("\nautores: %s ",artigos.autores);
printf("\ncitacoes: %d ",artigos.citacoes);
printf("\ncitepage: %s ",artigos.citepage);
printf("\ntimestamp: %s ",artigos.timestamp);
}

fscanf is not good for separting things based on tabs or newlines as opposed to spaces, because it treats all whitespace as the same -- something to be skipped and ignored. Whenever you have a whitespace character in your format string (doesn't matter if it'a a space or tab or newline; they all do the same thing), fscanf will read and throw away whitespace until it finds a non-whitespace character. So in your case, when it gets to the \t after the %d that read the citacoes, it will skip the following\t \t in the input, and the next character to be read will be 2, so that's where it will start reading for citepage.
Now you can use %*1[\t] in the format string to skip a single tab character (rather than all whitespace), but doing so is messy and error prone. It also gets easily confused by incorrect input, making it almost impossible to give the user proper diagnostics about malformed input. But if you want to do that, replace all the tabs in the format string with %*1[\t] and remove all the spaces and it should work.
A much better choice would be to read the entire line into a buffer (with fgets) and then use strsep to split it up on the tab characters.
Also you should never use feof -- it doesn't return true until after you've tried unsuccessfully to read past the end of the file. Always check the return value of the fscanf or fgets call instead.

The '\t' and ' ' directives do the same thing: skip any white-space. To use a tab as a separator in scanf(), one needs to use "%*1[\t]". strtok() or a simple loop may be easier. See #Chris Dodd
Suggest fgets()/sscanf() for better error handling especially for this complex scan.
Further, the format specifier for artigos.citepage needs adjustment.
char buf[200];
if (fgets(buf, sizeof buf, fp) == NULL) Handle_EOFIOError();
// scan, but do not save 1 `\t`
#define TF "*1[\t]"
if (8 == sscanf(buf,
"%d" TF "%19[^\t]" TF "%300[^\t]" TF "%d" TF
"%100[^\t]" TF "%d" TF "%19[^\t]" TF "%19[^\t]",
&artigos.id, artigos.sigla, artigos.titulo, &artigos.ano,
artigos.autores, &artigos.citacoes, artigos.citepage, artigos.timestamp) {
Success();
}

Related

General questions about scanf and fscanf in C programming language

If I'm not wrong, library function int fscanf(FILE *stream, const char *format, ...) works
exactly the same as function int scanf(const char *format, ...) except that it requires stream selection.
For example if I wanted to read two ints from standard input the code would look something like this.
int first_number;
int second_number;
scanf("%d%d", &first_number, &second_number);
There's no point of me adding newline character in between format specifiers even though the second number is entered in next line of input? Function just looks for next decimal integer right? What happens when I enter two characters instead of ints? Why the function sometimes doesn't work if there's a space between format specifiers?
In addition to that. When reading from file with fscanf(..), lets says the txt file contains next lines:
P6
255
1920 1080
Do I need to specify next line characters in fscanf(..)? I read it like this.
FILE *input = ..
char type[2];
int tr;
int width; int height;
fscanf(input, "%s\n", &type);
fscanf(input, "%d\n" &tr);
fscanf(input, "%d %d\n", &width, &height)
Is there a need for \n to signal next line?
Can fscanf(..) anyhow affect any other functions for reading files like fread()? Or is it a good practice to just stick to one function through the whole file?
scanf(...) operates like fscanf(stdin, ....).
Unless '\n', ' ', or other white spaces are inside a "%[...]", as part of a format for *scanf(), scanning functions the same as if ' ', '\t' '\n' was used. (Also for '\v, '\r, '\f.)
// All function the same.
fscanf(input, "%d\n" &tr);
fscanf(input, "%d " &tr);
fscanf(input, "%d\t" &tr);
There's no point of me adding newline character in between format specifiers even though the second number is entered in next line of input?
All format specifiers except "%n", "^[...]", "%c" consume optional leading white-spaces. With "%d" the is no need other than style to code a leading white-space in the format.
Function just looks for next decimal integer right?
Simply: yes.
What happens when I enter two characters instead of ints?
Scanning stops. The first non-numeric input remains in stdin as well as any following input. The *scanf() return value reflects the incomplete scan.
Why the function sometimes doesn't work if there's a space between format specifiers?
Need example. Having spaces between specifiers is not an issue unless the following specifier is one of "%n", "^[...]", "%c".
When reading from file with fscanf(..), .... Do I need to specify next line characters in fscanf(..)?
No. fscanf() is not line orientated. Use fgets() to read lines. fscanf() is challenging to use to read a line. Something like
char buf[100];
int cnt = fscanf(f, "%99[^\n]", buf);
if (cnt == 0) {
buf[0] = 0;
}
if (cnt != EOF) {
cnt = fscanf(f, "%*1[^\n]");
}
I read it like this. ... fscanf(input, "%s\n", &type); fscanf(input, "%d\n" &tr); ....
"it" as in a line is not read properly as "%s", "%d", "\n" all read consume 0, 1, 2, ... '\n' and other white-spaces. They do not read a line nor just the 1 character of the format.
Further "\n" does not complete upon reading 1 '\n', but continues reading all white-spaces until a non-white-space is detected (or end-of-file). Do not append such to the end of a format to read the rest of the line.
If want to read the trailing '\n', code could use int cnt = fscanf(input, "%d%*1[\n]" &tr);, but code will not know if it succeeded in reading the trailing '\n' after the int. It will have simply read it if it was there. Could use other formats, but really, using fgets() to read a line is better.
Is there a need for \n to signal next line?
No, as a format "\n" reads 0 or more whites-spaces, not just new-lines.
Can fscanf(..) anyhow affect any other functions for reading files like fread()?
Yes. All input function affect what is available next for other input functions. Mixing fread() and fscanf() is challenging to get right.
is it a good practice to just stick to one function through the whole file?
It certainly is simpler. I recommend to use input functions as building blocks for a helper function to handle your file input.
Tip: Read lines with fgets(), then parse. Set fscanf() aside until you understand why it has so much trouble with unexpected input.
The %d conversion specifier tells scanf and fscanf to skip over any leading whitespace, then read up to the first non-digit character, so you don’t need to put a newline between the two %d in the scanf call - in fact, if you do that, it means you have to have a newline between your inputs, not just blanks.
Most conversion specifiers skip over leading whitespace - the only ones that don’t are %c and %[, so you’ll want to be careful when using them.

Parsing string with sscanf that has a string in it

Project is in C. I need to parse strings that are always formatted the following way: integer, whitespace, plus sign, multi-word string, plus sign, white space, integer, whitespace, integer, end-of-line
Example:
10 +This is 1 string+ 2 -1
I'm having a hard time figuring out what to enter in the formatting of sscanf so that the string surrounded by the '+' signs get parsed correctly, without including the + signs. Assuming sscanf can be used for this case.
I tried "%d +%s+ %d %d" and that didn't work.
You use %s but that reads up to the first white space character. You want to read a string of not-plus-signs, so say that's what sscanf() should do:
"%d +%[^+]+ %d %d"
That's a scan set — see POSIX sscanf(). You should also protect yourself from buffer overflow. If you have:
char buffer[256];
use:
"%d +%255[^+]+ %d %d"
Note the off-by-one in the lengths — this is a design feature of the scanf() family of functions. You could skip leading spaces by putting a space after the first + in the format string. It is not possible to skip trailing spaces before the second + in the data; you'll have to remove those separately.
You ask for 'end of line' after the 3rd number. That's fairly hard. You might use:
"%d +%255[^+]+ %d %d %n"
passing an extra pointer to int argument to hold the offset of the last character parsed. The blank before the %n skips white space, including newlines, so if you read into int nbytes; (passing &nbytes), then you'd check if (buffer[nbytes] != '\0') { …handle trailing garbage… } (but only after checking that you had four successful conversion specifications — %n conversion specifications are not counted in the return value from sscanf() et al). There are other solutions to that; they're all grubby to some extent.

Difference between return values of scanf("%s", str) and scanf("%[^\n]s", str)

while(scanf("%s",a))
{
if(a[0]=='#')
break;
nextpermutation(a);
}
This while loop perfectly works for scanf("%s", a)
but if I put scanf("%[^\n]s", a) in the while it just runs for one time.
I checked the return value of both scanf and they were the same still.
I didn't get why this is happening...
The battle of two losers: "%s" vs. "%[^\n]s":
Both specifiers are bad and do not belong in robust code. Neither limits user input and can easily overrun the destination buffer.
The s in "%[^\n]s" serves no purpose. #Jonathan Leffler. So "%[^\n]" is reviewed following - still that is bad.
"%s" consumes leading white-space. "%[^\n]" does not #Igor Tandetnik
"%[^\n]" reads nothing if the leading character is a '\n', leaving a unchanged or rarely in an unknown state.
"%[^\n]" reads all characters in except '\n'.
"%s" reads all leading white-space, discards them, then reads and saves all non-white-spaces.
Better to use fgets() to read a line of user input and then parse it.
Example: to read a line of input, including spaces:
char a[100];
if (fgets(a, sizeof a, stdin) == NULL) Handle_EOF_or_Error();
// if the potential trailing \n is not wanted
a[strcspn(a, "\n")] = '\0';
What is that s doing there in %[^\n]s?
Your scanf("%[^\n]s") contains two format specifiers in its format string: %[^\n] followed by a lone s. This will require the input stream to contain s after the sequence matched by %[^\n]. This is, of course, is not very close in behavior to plain %s. Matching sequences for these formats will be very different for that reason alone.
Note that [] is not a modifier for %s format, as you seem to incorrectly believe. [] in scanf is a self-sufficient format specifier. Did you mean just %[^\n] by any chance instead of %[^\n]s?

Reading text with sscanf and fgets

So my text file looks similar to this
1. First 1.1
2. Second 2.2
Essentially an integer, string and then a float.
Using sscanf() and fgets() in theory, I should be able to scan this in (I have to do it in this format) but only get the integer can someone help point what I am doing wrong?
while(!feof(foo))
{
fgets(name, sizeof(name) - 1, foo);
sscanf(name,"%d%c%f", &intarray[i], &chararray[i], &floatarray[i]);
i++;
}
Where intarray, chararray, and floatarray are 1D arrays and i is an int initialized to 0.
The structure of the loop is wrong; you should not use feof() like that and you must always check the status of both fgets() and sscanf(). This code avoids overflowing the input arrays, too.
enum { MAX_ENTRIES = 10 };
int i;
int intarray[MAX_ENTRIES];
float floatarray[MAX_ENTRIES];
char chararray[MAX_ENTRIES][50];
for (i = 0; i < MAX_ENTRIES && fgets(name, sizeof(name), foo) != 0; i++)
{
if (sscanf(name,"%d. %49s %f", &intarray[i], chararray[i], &floatarray[i]) != 3)
...process format error...
}
Note the major changes:
The dot after the integer must be scanned by the format string.
The chararray has to be a 2D array to make any sense. If you read a single character with %c, it would contain the space after the first number, and the subsequent conversion specification (for the float value) would fail because the string name is not a floating point value.
The & in front of chararray[i] is not wanted when it is a 2D array. It would be needed if you were really reading a single character in a 1D array of characters instead of the whole string such as 'First' or 'Second' from the sample data.
The test checks that three values were converted successfully. Any smaller value indicates problems. With sscanf(), you'd only get EOF returned if there was nothing in the string for the first conversion specification to work on (empty string, all white space); you'd get 0 returned if the first non-blank was alphabetic or a punctuation character other than + or -, etc.
If you really want a single character instead of the name, then you'll have to arrange to read the extra characters in the word, maybe using:
if (sscanf(name,"%d %c%*s %f", &intarray[i], chararray[i], &floatarray[i]) != 3)
There's a space before the %c which is crucial; it will skip white space in the input, and then the %c will pick up the first non-blank character. The %*s will read more characters, skipping any white space (there won't be any) and then scanning a string of characters up to the next white space. The * suppresses an assignment; the scanned data won't be stored anywhere.
One of the major advantages of the fgets() plus sscanf() paradigm is that when you report the format error, you can report to the user the complete line of input that caused problems. If you use raw fscanf() or scanf(), you can only report on the first character that caused trouble, typically up to the end of the line, and then only if you write code to read that data. It is fiddlier (so the reporting is usually not very careful), and the available information is not as helpful to the user on those rare occasions when the reporting tries to be careful.
You need to change your format string to:
"%d %s %f"
The spaces are because you have spaces in your input data, the %s because you want to read a multi-character string at that point (%c only reads one character); don't worry though, as %s won't read past a space. You'll need to make sure you've got enough space in the target buffer to read the string, of course.
If you only want the first character of the second word, try:
"%d %c%s %f"
And add an extra (dummy) buffer to receive the string parsed by %s which you want to discard.
won't it be %s for string else it will only read a character with %c and then the float value might be affected.
try "%d %s %f"
%s won't help since it may read the float value itself. as far as I know, %c reads a single character. then it searches for a space that leads to problem. To scan the word, you can use a loop (terminated by a space ofcourse).

Making fscanf Ignore Optional Parameter

I am using fscanf to read a file which has lines like
Number <-whitespace-> string <-whitespace-> optional_3rd_column
I wish to extract the number and string out of each column, but ignore the 3rd_column if it exists
Example Data:
12 foo something
03 bar
24 something #randomcomment
I would want to extract 12,foo; 03,bar; 24, something while ignoring "something" and "#randomcomment"
I currently have something like
while(scanf("%d %s %*s",&num,&word)>=2)
{
assign stuff
}
However this does not work with lines with no 3rd column. How can I make it ignore everything after the 2nd string?
The problem is that the %*s is eating the number on the next line when there's no third column, and then the next %d is failing because the next token is not a number. To fix it without using gets() followed by sscanf(), you can use the character class specified:
while(scanf("%d %s%*[^\n]", &num, &word) == 2)
{
assign stuff
}
The [^\n] says to match as many characters as possible that aren't newlines, and the * suppresses assignment as before. Also note that you can't put a space between the %s and the %*[\n], because otherwise that space in the format string would match the newline, causing the %*[\n] to match the entire subsequent line, which is not what you want.
It would appear to me that the simplest solution is to scanf("%d %s", &num, &word) and then fgets() to eat the rest of the line.
Use fgets() to read a line at a time and then use sscanf() to look for the two columns you are interested in, more robust and you don't have to do anything special to ignore trailing data.
I often use gets() followed by an sscanf() on the string you just, er, gots.
Bonus: you can separate the test for end-of-input from the parsing.

Resources