I'm trying to write a function that takes a stream as an argument and reads from that stream and reads in the contents of the file up to the first whitespace as defined by the isspace function, and then uses the strtok function to parse the string. I'm not sure how to start it though with which function to read a line and ignore the whitespace. I know fgetc and getc only read one character at a time, and looking up the fscanf reference, will that work? Or does that only find items in your stream according to the format specifiers %s? Thanks!
To read an entire line at a time, you generally should use fgets. Some care is needed in case a line in the stream is longer than your buffer, the leftover will remain in the stream, which might not be what you want. (If you want to ignore the rest of the line, you can use fgets followed by fscanf as described at http://home.datacomm.ch/t_wolf/tw/c/getting_input.html.)
If you want to read in an entire line without worrying about a buffer size, you may wish to look into Chuck Falconer's ggets function which dynamically allocates a buffer for you (this does mean that you are responsible for freeing it).
C Answer
The fscanf s conversion will match a sequence of non-white-space characters. The input string stops at white space (as defined by isspace) or at the maximum field width, whichever occurs first. Note that there must be enough space in the provided buffer or it may be overflowed by long input.
FILE *fp;
char cstr[128];
fp = fopen("test.txt", "r");
while (!feof(fp))
{
fscanf(fp, "%s", cstr);
...
}
Original C Answer
The fgets function will allow you to read in the file one line at a time, but you will still need to check each character with isspace.
Since isspace may include space, form-feed ('\f'), newline ('\n'), carriage return ('\r'), horizontal tab ('\t'), and vertical tab ('\v') in its check for white-space characters, your best bet may be to read one character at a time in a loop using the fgetc function. Note that if the integer value returned by fgetc() is stored into a variable of type char and then compared against the integer constant EOF, the comparison may never succeed, because sign-extension of a variable of type char on widening to integer is implementation-defined.
FILE *fp;
int c;
fp = fopen("test.txt", "r");
while ((c = fgetc(fp)) != EOF)
{
if (isspace(c))
{
...
}
else
{
...
}
}
Original C++ Answer
The istream::getline method will allow you to read in one line at a time and optionally specify the delimiter (default is '/n').
Since isspace may include space, form-feed ('\f'), newline ('\n'), carriage return ('\r'), horizontal tab ('\t'), and vertical tab ('\v') in its check for white-space characters, your best bet may be to read one character at a time in a loop using the istream::get method.
char c;
string str;
ifstream file("test.txt",ios::in);
while (file.get(c))
{
if (isspace((unsigned char)c))
{
...
}
else
{
str.push_back(c);
}
file.peek();
if (file.eof())
{
break;
}
}
Note: Error checking was omitted from the all of the above code for simplicity.
well although getc/fgetc only retrieve 1 character at a time, you can put them into a loop right ?:)
Related
i have tried to use k = getchar() but it doesn't work too;
here is my code
#include<stdio.h>
int main()
{
float height;
float k=0;
do
{
printf("please type a value..\n");
scanf("%f",&height);
k=height;
}while(k<0);// i assume letters and non positive numbers are below zero.
//so i want the loop to continue until one types a +ve float.
printf("%f",k);
return 0;
}
i want a if a user types letters or negative numbers or characters he/she should be prompted to type the value again until he types a positive number
Like Govind Parmar already suggested, it is better/easier to use fgets() to read a full line of input, rather than use scanf() et al. for human-interactive input.
The underlying reason is that the interactive standard input is line-buffered by default (and changing that is nontrivial). So, when the user starts typing their input, it is not immediately provided to your program; only when the user presses Enter.
If we do read each line of input using fgets(), we can then scan and convert it using sscanf(), which works much like scanf()/fscanf() do, except that sscanf() works on string input, rather than an input stream.
Here is a practical example:
#include <stdlib.h>
#include <stdio.h>
#define MAX_LINE_LEN 100
int main(void)
{
char buffer[MAX_LINE_LEN + 1];
char *line, dummy;
double value;
while (1) {
printf("Please type a number, or Q to exit:\n");
fflush(stdout);
line = fgets(buffer, sizeof buffer, stdin);
if (!line) {
printf("No more input; exiting.\n");
break;
}
if (sscanf(line, " %lf %c", &value, &dummy) == 1) {
printf("You typed %.6f\n", value);
continue;
}
if (line[0] == 'q' || line[0] == 'Q') {
printf("Thank you; now quitting.\n");
break;
}
printf("Sorry, I couldn't parse that.\n");
}
return EXIT_SUCCESS;
}
The fflush(stdout); is not necessary, but it does no harm either. It basically ensures that everything we have printf()'d or written to stdout, will be flushed to the file or device; in this case, that it will be displayed in the terminal. (It is not necessary here, because standard output is also line buffered by default, so the \n in the printf pattern, printing a newline, also causes the flush.
I do like to sprinkle those fflush() calls, wherever I need to remember that at this point, it is important for all output to be actually flushed to output, and not cached by the C library. In this case, we definitely want the prompt to be visible to the user before we start waiting for their input!
(But, again, because that printf("...\n"); before it ends with a newline, \n, and we haven't changed the standard output buffering, the fflush(stdout); is not needed there.)
The line = fgets(buffer, sizeof buffer, stdin); line contains several important details:
We defined the macro MAX_LINE_LEN earlier on, because fgets() can only read a line as long as the buffer it is given, and will return the rest of that line in following calls.
(You can check if the line read ended with a newline: if it does not, then either it was the final line in an input file that does not have a newline at the end of the last line, or the line was longer than the buffer you have, so you only received the initial part, with the rest of the line still waiting for you in the buffer.)
The +1 in char buffer[MAX_LINE_LEN + 1]; is because strings in C are terminated by a nul char, '\0', at end. So, if we have a buffer of 19 characters, it can hold a string with at most 18 characters.
Note that NUL, or nul with one ell, is the name of the ASCII character with code 0, '\0', and is the end-of-string marker character.
NULL (or sometimes nil), however, is a pointer to the zero address, and in C99 and later is the same as (void *)0. It is the sentinel and error value we use, when we want to set a pointer to a recognizable error/unused/nothing value, instead of pointing to actual data.
sizeof buffer is the number of chars, total (including the end-of-string nul char), used by the variable buffer.
In this case, we could have used MAX_LINE_LEN + 1 instead (the second parameter to fgets() being the number of characters in the buffer given to it, including the reservation for the end-of-string char).
The reason I used sizeof buffer here, is because it is so useful. (Do remember that if buffer was a pointer and not an array, it would evaluate to the size of a pointer; not the amount of data available where that pointer points to. If you use pointers, you will need to track the amount of memory available there yourself, usually in a separate variable. That is just how C works.)
And also because it is important that sizeof is not a function, but an operator: it does not evaluate its argument, it only considers the size (of the type) of the argument. This means that if you do something silly like sizeof (i++), you'll find that i is not incremented, and that it yields the exact same value as sizeof i. Again, this is because sizeof is an operator, not a function, and it just returns the size of its argument.
fgets() returns a pointer to the line it stored in the buffer, or NULL if an error occurred.
This is also why I named the pointer line, and the storage array buffer. They describe my intent as a programmer. (That is very important when writing comments, by the way: do not describe what the code does, because we can read the code; but do describe your intent as to what the code should do, because only the programmer knows that, but it is important to know that intent if one tries to understand, modify, or fix the code.)
The scanf() family of functions returns the number of successful conversions. To detect input where the proper numeric value was followed by garbage, say 1.0 x, I asked sscanf() to ignore any whitespace after the number (whitespace means tabs, spaces, and newlines; '\t', '\n', '\v', '\f', '\r', and ' ' for the default C locale using ASCII character set), and try to convert a single additional character, dummy.
Now, if the line does contain anything besides whitespace after the number, sscanf() will store the first character of that anything in dummy, and return 2. However, because I only want lines that only contain the number and no dummy characters, I expect a return value of 1.
To detect the q or Q (but only as the first character on the line), we simply examine the first character in line, line[0].
If we included <string.h>, we could use e.g. if (strchr(line, 'q') || strchr(line, 'Q')) to see if there is a q or Q anywhere in the line supplied. The strchr(string, char) returns a pointer to the first occurrence of char in string, or NULL if none; and all pointers but NULL are considered logically true. (That is, we could equivalently write if (strchr(line, 'q') != NULL || strchr(line, 'Q') != NULL).)
Another function we could use declared in <string.h> is strstr(). It works like strchr(), but the second parameter is a string. For example, (strstr(line, "exit")) is only true if line has exit in it somewhere. (It could be brexit or exitology, though; it is just a simple substring search.)
In a loop, continue skips the rest of the loop body, and starts the next iteration of the loop body from the beginning.
In a loop, break skips the rest of the loop body, and continues execution after the loop.
EXIT_SUCCESS and EXIT_FAILURE are the standard exit status codes <stdlib.h> defines. Most prefer using 0 for EXIT_SUCCESS (because that is what it is in most operating systems), but I think spelling the success/failure out like that makes it easier to read the code.
I wouldn't use scanf-family functions for reading from stdin in general.
fgets is better since it takes input as a string whose length you specify, avoiding buffer overflows, which you can later parse into the desired type (if any). For the case of float values, strtof works.
However, if the specification for your deliverable or homework assignment requires the use of scanf with %f as the format specifier, what you can do is check its return value, which will contain a count of the number of format specifiers in the format string that were successfully scanned:
§ 7.21.6.2:
The [scanf] function returns the value of the macro EOF if an input failure occurs
before the first conversion (if any) has completed. Otherwise, the function returns the
number of input items assigned, which can be fewer than provided for, or even zero, in
the event of an early matching failure.
From there, you can diagnose whether the input is valid or not. Also, when scanf fails, stdin is not cleared and subsequent calls to scanf (i.e. in a loop) will continue to see whatever is in there. This question has some information about dealing with that.
How to scan total line from user input with c program?
I tried scanf("%99[^\n]",st), but it is not working when I scan something before this scan statment.It worked if this is the first scan statement.
How to scan total line from user input with c program?
There are many ways to read a line of input, and your usage of the word scan suggests you're already focused on the scanf() function for the job. This is unfortunate, because, although you can (to some extent) achieve what you want with scanf(), it's definitely not the best tool for reading a line.
As already stated in the comments, your scanf() format string will stop at a newline, so the next scanf() will first find that newline and it can't match [^\n] (which means anything except newline). As a newline is just another whitespace character, adding a blank in front of your conversion will silently eat it up ;)
But now for the better solution: Assuming you only want to use standard C functions, there's already one function for exactly the job of reading a line: fgets(). The following code snippet should explain its usage:
char line[1024];
char *str = fgets(line, 1024, stdin); // read from the standard input
if (!str)
{
// couldn't read input for some reason, handle error here
exit(1); // <- for example
}
// fgets includes the newline character that ends the line, but if the line
// is longer than 1022 characters, it will stop early here (it will never
// write more bytes than the second parameter you pass). Often you don't
// want that newline character, and the following line overwrites it with
// 0 (which is "end of string") **only** if it was there:
line[strcspn(line, "\n")] = 0;
Note that you might want to check for the newline character with strchr() instead, so you actually know whether you have the whole line or maybe your input buffer was to small. In the latter case, you might want to call fgets() again.
How to scan total line from user input with c program?
scanf("%99[^\n]",st) reads a line, almost.
With the C Standard Library a line is
A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. C11dr §7.21.2 2
scanf("%99[^\n]",st) fails to read the end of the line, the '\n'.
That is why on the 2nd call, the '\n' remains in stdin to be read and scanf("%99[^\n]",st) will not read it.
There are ways to use scanf("%99[^\n]",st);, or a variation of it as a step in reading user input, yet they suffer from 1) Not handling a blank line "\n" correctly 2) Missing rare input errors 3) Long line issues and other nuances.
The preferred portable solution is to use fgets(). Loop example:
#define LINE_MAX_LENGTH 200
char buf[LINE_MAX_LENGTH + 1 + 1]; // +1 for long lines detection, +1 for \0
while (fgets(buf, sizeof buf, stdin)) {
size_t eol = strcspn(buf, "\n"); **
buf[eol] = '\0'; // trim potential \n
if (eol >= LINE_MAX_LENGTH) {
// IMO, user input exceeding a sane generous threshold is a potential hack
fprintf(stderr, "Line too long\n");
// TBD : Handle excessive long line
}
// Use `buf[[]`
}
Many platforms support getline() to read a line.
Short-comings: Non C-standard and allow a hacker to overwhelm system resources with insanely long lines.
In C, there is not a great solution. What is best depends on the various coding goals.
** I prefer size_t eol = strcspn(buf, "\n\r"); to read lines in a *nix environment that may end with "\r\n".
scanf() should never be used for user input. The best way to get input from the user is with fgets().
Read more: http://sekrit.de/webdocs/c/beginners-guide-away-from-scanf.html
char str[1024];
char *alline = fgets(str, 1024, stdin);
scanf("%[^'\n']s",alline);
I think the correct solution should be like this. It is worked for me.
Hope it helps.
I'm trying to read a line using the following code:
while(fscanf(f, "%[^\n\r]s", cLine) != EOF )
{
/* do something with cLine */
}
But somehow I get only the first line every time. Is this a bad way to read a line? What should I fix to make it work as expected?
It's almost always a bad idea to use the fscanf() function as it can leave your file pointer in an unknown location on failure.
I prefer to use fgets() to get each line in and then sscanf() that. You can then continue to examine the line read in as you see fit. Something like:
#define LINESZ 1024
char buff[LINESZ];
FILE *fin = fopen ("infile.txt", "r");
if (fin != NULL) {
while (fgets (buff, LINESZ, fin)) {
/* Process buff here. */
}
fclose (fin);
}
fgets() appears to be what you're trying to do, reading in a string until you encounter a newline character.
If you want read a file line by line (Here, line separator == '\n') just make that:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv)
{
FILE *fp;
char *buffer;
int ret;
// Open a file ("test.txt")
if ((fp = fopen("test.txt", "r")) == NULL) {
fprintf(stdout, "Error: Can't open file !\n");
return -1;
}
// Alloc buffer size (Set your max line size)
buffer = malloc(sizeof(char) * 4096);
while(!feof(fp))
{
// Clean buffer
memset(buffer, 0, 4096);
// Read a line
ret = fscanf(fp, "%4095[^\n]\n", buffer);
if (ret != EOF) {
// Print line
fprintf(stdout, "%s\n", buffer);
}
}
// Free buffer
free(buffer);
// Close file
fclose(fp);
return 0;
}
Enjoy :)
If you try while( fscanf( f, "%27[^\n\r]", cLine ) == 1 ) you might have a little more luck. The three changes from your original:
length-limit what gets read in - I've used 27 here as an example, and unfortunately the scanf() family require the field width literally in the format string and can't use the * mechanism that the printf() can for passing the value in
get rid of the s in the format string - %[ is the format specifier for "all characters matching or not matching a set", and the set is terminated by a ] on its own
compare the return value against the number of conversions you expect to happen (and for ease of management, ensure that number is 1)
That said, you'll get the same result with less pain by using fgets() to read in as much of a line as will fit in your buffer.
Using fscanf to read/tokenise a file always results in fragile code or pain and suffering. Reading a line, and tokenising or scanning that line is safe, and effective. It needs more lines of code - which means it takes longer to THINK about what you want to do (and you need to handle a finite input buffer size) - but after that life just stinks less.
Don't fight fscanf. Just don't use it. Ever.
It looks to me like you're trying to use regex operators in your fscanf string. The string [^\n\r] doesn't mean anything to fscanf, which is why your code doesn't work as expected.
Furthermore, fscanf() doesn't return EOF if the item doesn't match. Rather, it returns an integer that indicates the number of matches--which in your case is probably zero. EOF is only returned at the end of the stream or in case of an error. So what's happening in your case is that the first call to fscanf() reads all the way to the end of the file looking for a matching string, then returns 0 to let you know that no match was found. The second call then returns EOF because the entire file has been read.
Finally, note that the %s scanf format operator only captures to the next whitespace character, so you don't need to exclude \n or \r in any case.
Consult the fscanf documentation for more information: http://www.cplusplus.com/reference/clibrary/cstdio/fscanf/
Your loop has several issues. You wrote:
while( fscanf( f, "%[^\n\r]s", cLine ) != EOF )
/* do something */;
Some things to consider:
fscanf() returns the number of items stored. It can return EOF if it reads past the end of file or if the file handle has an error. You need to distinguish a valid return of zero in which case there is no new content in the buffer cLine from a successfully read.
You do a have a problem when a failure to match occurs because it is difficult to predict where the file handle is now pointing in the stream. This makes recovery from a failed match harder to do than might be expected.
The pattern you wrote probably doesn't do what you intended. It is matching any number of characters that are not CR or LF, and then expecting to find a literal s.
You haven't protected your buffer from an overflow. Any number of characters may be read from the file and written to the buffer, regardless of the size allocated to that buffer. This is an unfortunately common error, that in many cases can be exploited by an attacker to run arbitrary code of the attackers choosing.
Unless you specifically requested that f be opened in binary mode, line ending translation will happen in the library and you will generally never see CR characters, and usually not in text files.
You probably want a loop more like the following:
while(fgets(cLine, N_CLINE, f)) {
/* do something */ ;
}
where N_CLINE is the number of bytes available in the buffer starting a cLine.
The fgets() function is a much preferred way to read a line from a file. Its second parameter is the size of the buffer, and it reads up to 1 less than that size bytes from the file into the buffer. It always terminates the buffer with a nul character so that it can be safely passed to other C string functions.
It stops on the first of end of file, newline, or buffer_size-1 bytes read.
It leaves the newline character in the buffer, and that fact allows you to distinguish a single line longer than your buffer from a line shorter than the buffer.
It returns NULL if no bytes were copied due to end of file or an error, and the pointer to the buffer otherwise. You might want to use feof() and/or ferror() to distinguish those cases.
i think the problem with this code is because when you read with %[^\n\r]s, in fact, you reading until reach '\n' or '\r', but you don't reading the '\n' or '\r' also.
So you need to get this character before you read with fscanf again at loop.
Do something like that:
do{
fscanf(f, "%[^\n\r]s", cLine) != EOF
/* Do something here */
}while(fgetc(file) != EOF)
I'm working on a program that reads text from a file and parses the text to words and manipulates them. I'm parsing with fscanf like that
while (fscanf (fp, " %32[^ ,.\t\n]%*c", word) == 1)
{
/*manipulate the text word by word */
…
}
I wanna write next to each word that I find in which line I found it.
Is there a way that I can check when I moved down a line
when using the function fscanf?
The soundest advice is to use fgets() or perhaps POSIX
getline() to read lines and then consider using
sscanf() to parse each line. You will probably need to consider how to use sscanf() in a loop. There are also numerous other options for parsing the line instead of sscanf(), such as strtok_r() or the less desirable strtok() — or, on Windows, strtok_s();
strspn(),
strcspn(),
strpbrk(); and other functions that are not as standardized.
If you feel you must use fscanf(), then you probably need to capture the trailing context. A simple version of that would be:
char c;
while (fscanf(fp, " %32[^ ,.\t\n]%c", word, &c) == 2)
…
This captures the character after the word, assuming there is one. If your file doesn't end with a newline, it is possible a word will be lost. It's also rather too easy to miss a newline. For example, if the line ends with a full stop (period) before the newline, then c will hold the . and the newline will be skipped by the next iteration of the loop. You could overcome that with:
char s[33];
while (fscanf(fp, " %32[^ ,.\t\n]%32[ ,.\t\n]", word, s) == 2)
…
Note that the length in the format string must be one less than the length in the variable declaration!
After a successful call to fscanf(), the string s could contain multiple newlines and blanks and so on. The fscanf() functions mostly don't care about newlines, and the scan set for s would read multiple newlines in a row if that's what's in the data file.
If you explicitly capture the status from fscanf(), you can be more sensitive to files that end without a newline (or a punctuation character), or that cause other problems:
char s[33];
int rc;
while ((rc = fscanf(fp, " %32[^ ,.\t\n]%32[ ,.\t\n]", word, s)) != EOF)
{
switch (rc)
{
case 2:
…proceed as normal, checking s for newlines.
break;
case 1:
…probably an overlong word or EOF without a newline.
break;
case 0:
…probably means the next character is one of comma or dot.
…spaces, tabs, newlines will be skipped without detection
…by the leading space in the format string.
break;
default:
assert(0);
break;
}
}
If you start to care about !, ?, ;, :, ' or " characters — not to mention ( and ) — life gets more complex still. In fact, at that point, the alternatives to sscanf() start looking much better.
It is very hard to use the scanf() family of functions correctly. They're anything but tools for the novice, at least once you start needing to do anything complex. You could look at A beginner's guide to not using scanf(), which contains much valuable information. I'm not wholly convinced by the last couple of examples which are supposed to be bomb-proof uses of scanf(). (It is a little easier to use sscanf() correctly, but you still need to understand what you're up to in detail.)
Read lines with fgets() and then parse them using sscanf:
char buff[1024];
int lineno = 0;
int offset = 0;
while (fgets(buff, 1024, fp)) {
lineno++;
offset = 0;
while (sscanf(buff + offset, " %32[^ ,.\t\n]%*c", word) == 1)
{
/* manipulate the text word by word */
}
}
In second loop you must increase buffer offset appropriately in order to parse line correctly. for this you can use %n for example in order to get read bytes.
I was wondering if it is possible to only read in particular parts of a string using scanf.
For example since I am reading from a file i use fscanf
if I wanted to read name and number (where number is the 111-2222) when they are in a string such as:
Bob Hardy:sometext:111-2222:sometext:sometext
I use this but its not working:
(fscanf(read, "%23[^:] %27[^:] %10[^:] %27[^:] %d\n", name,var1, number, var2, var3))
Your initial format string fails because it does not consume the : delimiters.
If you want scanf() to read a portion of the input, but you don't care what is actually read, then you should use a field descriptor with the assignment-suppression flag (*):
char nl;
fscanf(read, "%23[^:]:%*[^:]:%10[^:]%*[^\n]%c", name, number, &nl);
As a bonus, you don't need to worry about buffer overruns for fields with assignment suppressed.
You should not attempt to match a single newline via a trailing newline character in the format, because a literal newline (or space or tab) in the format will match any run of whitespace. In this particular case, it would consume not just the line terminator but also any leading whitespace on the next line.
The last field is not suppressed, even though it will almost always receive a newline, because that way you can tell from the return value if you've scanned the last line of the file and it is not newline-terminated.
Check fscanf() return value.
fscanf(read, "%23[^:] %27[^:] ... is failing because after scanning the first field with %23[^:], fscanf() encounters a ':'. Since that does not match the next part of the format, a white-space as in ' ', scanning stops.
Had code checked the returned value of fscanf(), which was certainly 1, it may have been self-evident the source of the problem. So the scanning needs to consume the ':', add it to the format: "%23[^:]: %27[^:]: ...
Better to use fgets()
Using fscanf() to read data and detect properly and improperly formatted data is very challenging. It can be done correctly to scan expected input. Yet it rarely works to handle some incorrectly formated input.
Instead, simple read a line of data and then parse it. Using '%n' is an easy way to detect complete conversion as it saves the char scan count - if scanning gets there.
char buffer[200];
if (fgets(buffer, sizeof buffer, read) == NULL) {
return EOF;
}
int n = 0;
sscanf(buffer, " %23[^:]: %27[^:]: %10[^:]: %27[^:]:%d %n",
name, var1, number, var2, &var3, &n);
if (n == 0) {
return FAIL; // scan incomplete
}
if (buffer[n]) {
return FAIL; // Extra data on line
}
// Success!
Note: sample input ended with text, but original format used "%d". Unclear on OP's intent.