How to know new line character in fscanf? - c

How to know that fscanf reached a new line \n in a file.
I have been using my own functions for doing that. I know I can use fgets and then sscanf for my required pattern. But my requirements are not stable, some times I want to get TAB separated strings, some times new line separated strings and some times some special character separated strings. So if there is any way to know of new line from fscanf please help me. Or else any alternative ways are also welcome.
Thanks in advance.

fscanf(stream, "%42[^\n]", buffer);
is an equivalant of fgets(buffer, 42, stream). You can't replace the 42 by * to specify the buffer length in the argument (as you can do in printf), its meaning is to suppress the assignment. So
fscanf(stream, "%*[^\n]%*c");
read upto (and included) the next end of line character.
Any conversion specifier other than [, c and n start by skipping whitespaces.

Kernighan and Pike in the excellent book 'The Practice of Programming' show how to use sprintf() to create an appropriate format specifier including the length (similar to the examples in the answer by AProgrammer), and then use that in the call to scanf(). That way, you can also control the separators. Concerns about the 'inefficiency' of this approach are probably misguided - the alternatives are harder to get right.
That said, I most normally do not use the scanf() family of functions for file I/O; I get the data into a string with some sort of 'get line' routine, and then use the sscanf() family of functions to split it up - or other more specialized parsing code.

Related

Reading specifically formatted string

I'm attempting to read a file containing lines of strings in the following format:
"string";"string";"string";"string";"string"
How do i read them each using functions compatible on windows and linux?
Length of each string is unknown.
i have attempted to use fscanf like this:
fscanf(fp, "\"%s\";\"%s\";\"%s\";\"%s\";\"%s\"\n");
But the first string picked up the whole line.
If you really want to use fscanf, you could use a format string like this :
fscanf(fp, "\"%[^\"]\";\"%[^\"]\";\"%[^\"]\";\"%[^\"]\";\"%[^\"]\"\n", ...);
For more details, read up on the [set] conversion specifier in the reference docs for fscanf.
Note that this will not work with embedded '"' characters in the strings.
This also leaves no flexibility (like additional whitespace around the semicolons, optional quotes, etc.).
In case those limitations are problematic for you, you'll want a more intelligent parser (libcsv comes to mind eg.). Also ref. pmg's answer for how to roll your own.
here's some pseudo-code for you
loop
getchar; if not a quote exit with error
loop
getchar; mind EOF
if not a quote, add to string
if a quote exit inner loop
use string
getchar; if not semicolon exit with error unless EOF

Scan whole line from file in C Programming

I was writing a program to input multiple lines from a file.
the problem is i don't know the length of the lines, so i cant use fgets cause i need to give the size of the buffer and cant use fscanf cause it stops at a space token
I saw a solution where he recommended using malloc and realloc for each character taken as input but i think there's an easier way and then i found someone suggesting using
fscanf(file,"%[^\n]",line);
Does anyone have a better solution or can someone explain how the above works?(i haven't tested it)
i use GCC Compiler, if that's needed
You can use getline(3). It allocates memory on your behalf, which you should free when you are finished reading lines.
and then i found someone suggesting using fscanf(file,"%[^\n]",line);
That's practically an unsafe version of fgets(line, sizeof line, file);. Don't do that.
If you don't know the file size, you have two options.
There's a LINE_MAX macro defined somewhere in the C library (AFAIK it's POSIX-only, but some implementations may have equivalents). It's a fair assumption that lines don't exceed that length.
You can go the "read and realloc" way, but you don't have to realloc() for every character. A conventional solution to this problem is to exponentially expand the buffer size, i. e. always double the allocated memory when it's exhausted.
A simple format specifier for scanf or fscanf follows this prototype
%specifier
specifiers
As we know d is format specifier for integers Like this
[characters] is Scanset Any number of the characters specified between the brackets.
A dash (-) that is not the first character may produce non-portable behavior in some library implementations.
[^characters] is
Negated scanset Any number of characters none of them specified as characters between the brackets.
fscanf(file,"%[^\n]",line);
Read any characters till occurance of any charcter in Negated scanset in this case newline character
As others suggested you can use getline() or fgets() and see example
The line fscanf(file,"%[^\n]",line); means that it will read anything other than \n into line. This should work in Linux and Windows, I think. But may not work in OS X format which use \r to end a line.

In C, shall I use FLUSH every time I use scanf to get rid of buffer?

As the title says, shall I use
while(getchar() != '\n');
every time I use scanf?
And can someone explain the logic behind
while(getchar() != '\n');
Thanks.
No, you generally don't need to do that. The loop you posted reads characters from stdin until it encounters one that's not \n. The way you wrote it, that last non-newline character is lost just like the newlines.
Typical problems or the need for "flushing" can be avoided by:
Not mixing scanf with other input methods. For example don't mix it with fgets
Preceding format specifiers with a space where space isn't ignored and you want it ignored
For example, to ignore blanks, instead of scanf("%c"...) use scanf(" %c"..).
That aside, when you have complex input to read in you might want to:
Read entire strings with fgets, which you can then parse as you please with sscanf, strtok et al. It may look like a contradiction, recommending sscanf where scanf is inadequate. The point is once you have the full string stored safely using fgets, you've got considerably more freedom to analyze it, throw portions that don't match, do a strchr here and there etc
Use languages (with libraries) better suited for the job, like python or perl to reduce the task to a simpler problem
Use a full-blown lexer

Use scanf with Regular Expressions

I've been trying to use regular expressions on scanf, in order to read a string of maximum n characters and discard anything else until the New Line Character. Any spaces should be treated as regular characters, thus included in the string to be read.
I've studied a Wikipedia article about Regular Expressions, yet I can't get scanf to work properly. Here is some code I've tried:
scanf("[ ]*%ns[ ]*[\n]", string);
[ ] is supposed to go for the actual space character, * is supposed to mean one or more, n is the number of characters to read and string is a pointer allocated with malloc.
I have tried several different combinations; however I tend to get only the first word of a sentence read (stops at space character). Furthermore, * seems to discard a character instead of meaning "zero or more"...
Could anybody explain in detail how regular expressions are interpreted by scanf? What is more, is it efficient to use getc repetitively instead?
Thanks in Advance :D
The short answer: scanf does not handle regular expressions literally speaking.
If you want to use regular expressions in C, you could use the regex POSIX library. See the following question for a basic example on this library usage : Regular expressions in C: examples?
Now if you want to do it the scanf way you could try something like
scanf("%*[ ]%ns%*[ ]\n",str);
Replace the n in %ns by the maximal number of characters to read from input stream.
The %*[ ] part asks to ignore any spaces. You could replace the * by a specific number to ignore a precise number of characters. You could add other characters between braces to ignore more than just spaces.
Not sure if the above scanf would work as spaces are also matched with the %s directive.
I would definitely go with a fgets call, then triming the surrounding whitespaces with something like the following: How do I trim leading/trailing whitespace in a standard way?
is it efficient to use getc repetitively instead?
Depends somewhat on the application, but YES, repeated getc() is efficient.
unless I read the question wrong, %[^'\n']s will save everything until the carriage return is encountered.

Force fscanf to Consume Possible Whitespace

I have a multiline TSV file with the following format:
Type\tBasic Name\tAttribute\tA Long Description\n
As you can see, the Basic Name and the Description can both contain some number of spaces. I am trying to read each line in and extract the elements. For now, I've narrowed it down to just extracting the basic name. My fscanf is as follows:
fscanf(file_in, "%*[^ ]s\t%128[^ ]s\t%*[^ ]s\t%[^ ]s\n", name_string, desc_string);
This doesn't work as I have hoped, and I'm having trouble narrowing down the error. Does anyone know how I could read in the lines properly?
I mostly agree with Pablo (that the scanf family don't make great parsers), but it's worth understanding how to write a scanf pattern. The pattern you're looking for is something like this:
fscanf(" %*[^\t] %128[^\t] %*[^\t] %128[^\n]", name_string, desc_string)
Notes:
%[xyz] is a directive. %[xyz]s is two directives, the second of which matches a literal s
As far a I know, there is no way to match a single literal tab character, since any whitespace in the pattern matches any amount of whitespace (including none) in the input. I used a space in my example, which will match a terminating tab, but it will also match any number of consecutive tabs so empty fields won't be parsed correctly.
The 128-character limit does not include the terminating NUL character.
Also, if the scan stops because the chracter limit is exceeded, it won't skip the rest of the field automatically, so you'll end up out of synch with the input.
A better pattern would be:
fscanf(" %*[^\t] %128[^\t]%*[^\t] %*[^\t] %128[^\n]%*[^\n]", name_string, desc_string)
which explicitly skips the remaining characters in the field, if necessary. An even better solution would be to use the a modifier and get fscanf to malloc memory for you.
I'd rather use strtok for this. It's more acurate than fscanf since this function family only work when the format is 100% OK, otherwise you end up missing values.
Take a look at Parallel to PHP's "explode" in C: Split char* into char* using delimiter, where I explain in more detail how to use strtok.
So, read each line with fgets and parse it with strtok.
Firstly, as it has already been noted, the %[] is a conversion specifier by itself. There's no s after the []. The s-es that you have in your format string will not be considered parts of the conversion specifiers. You have to get rid of those s-es.
Secondly, as you said yourself, your file is TAB-separated. Which immediately means that you should extract the continuous portions of the sequence by using the %[^\t] conversion specifier (or the %[^\n] specifier for the last portion). Why did you use %[^ ] and how did you expect it to work? The %[^ ] actually stops parsing at space character, which is the opposite of what you wanted.
In your example the proper combination of specifiers would be
fscanf(file_in, "%*[^\t]\t%128[^\t]\t%*[^\t]\t%[^\n]\n", name_string, desc_string);
This format string assumes that all 4 portions of the string are guaranteed to be present and that the last portion is guaranteed to be terminated by \n.

Resources