I'm reading in a .txt file. I'm using fscanf to get the data as it is formatted.
The line I'm having problems with is this:
result = fscanf(fp, "%s", ap->name);
This is fine until I have a name with a whitespace eg: St Ives
So I use this to read in the white space:
result = fscanf(fp, "%[^\n]s", ap->name);
However, when I try to read in the first name (with no white space) it just doesn't work and messes up the other fscanf.
But I use the [^\n] it works fine within a different file I'm using. Not sure what is happening.
If I use fgets in the place of the fscanf above I get "\n" in the variable.
Edit//
Ok, so if I use:
result = fscanf(fp, "%s", ap->name);
result = fscanf(fp, "%[^\n]s", ap->name);
This allows me to read in a string with no white space. But When I get a "name" with whitespace it doesn't work.
One problem with this:
result = fscanf(fp, "%[^\n]s", ap->name);
is that you have an extra s at the end of your format specifier. The entire format specifier should just be %[^\n], which says "read in a string which consists of characters which are not newlines". The extra s is not part of the format specifier, so it's interpreted as a literal: "read the next character from the input; if it's an "s", continue, otherwise fail."
The extra s doesn't actually hurt you, though. You know exactly what the next character of input: a newline. It doesn't match, and input processing stops there, but it doesn't really matter since it's the end of your format specifier. This would cause problems, though, if you had other format specifiers after this one in the same format string.
The real problem is that you're not consuming the newline: you're only reading in all of the characters up to the newline, but not the newline itself. To fix that, you should do this:
result = fscanf(fp, "%[^\n]%*c", ap->name);
The %*c specifier says to read in a character (c), but don't assign it to any variable (*). If you omitted the *, you would have to pass fscanf() another parameter containing a pointer to a character (a char*), where it would then store the resulting character that it read in.
You could also use %[^\n]\n, but that would also read in any whitespace which followed the newline, which may not be what you want. When fscanf finds whitespace in its format specifier (a space, newline, or tab), it consumes as much whitespace as it can (i.e. you can think of it consuming the longest string that matches the regular expression [ \t\n]*).
Finally, you should also specify a maximum length to avoid buffer overruns. You can do this by placing the buffer length in between the % and the [. For example, if ap->name is a buffer of 256 characters, you should do this:
result = fscanf(fp, "%255[^\n]%*c", ap->name);
This works great for statically allocated arrays; unfortunately, if the array is dyamically sized at runtime, there's no easy to way to pass the buffer size to fscanf. You'll have to create the format string with sprintf, e.g.:
char format[256];
snprintf(format, sizeof(format), "%%%d[^\n]%%*c", buffer_size - 1);
result = fscanf(fp, format, ap->name);
Jumm wrote:
If I use fgets in the place of the fscanf above I get "\n" in the variable.
Which is a far easier problem to solve so go with it:
fgets( ap->name, MAX, fp ) ;
nlptr = strrchr ( ap->name, '\n' ) ;
if( nlptr != 0 )
{
*nlptr = '\0' ;
}
I'm not sure how you mean [^\n] is suppose to work. [] is a modifier which says "accept one character except any of the characters which is inside this block". The ^ inverts the condition. %s with fscanf only reads until it comes across a delimiter. For strings with spaces and newlines in them, use a combination of fgets and sscanf instead, and specify a restriction on the length.
There is no such thing as I gather you are trying to imply a regular expression in the fscanf function which does not exist, not that to my knowledge nor have I seen it anywhere - enlighten me on this.
The format specifier for reading a string is %s, it could be that you need to do it this way, %s\n which will pick up the newline.
But for pete's sake do not use the standard old gets family functions as specified by Clifford's answer above as that is where buffer overflows happen and was used in a infamous worm of the 1990's - the Morris Worm, more specifically in the fingerd daemon, that used to call gets that caused chaos. Fortunately, now, that has now been patched. And furthermore, a lot of programmers have been drilled into the mentality not to use the function.
Even Microsoft has adopted a safe version of gets family of functions, that specifies a parameter to indicate the length of buffer instead.
EDIT
My bad - I did not realize that Clifford indeed has specified the max length for input...Whoops! Sorry! Clifford's answer is correct! So +1 to Clifford's answer.
Thanks Neil for pointing out my error...
Hope this helps,
Best regards,
Tom.
I found the problem.
As Paul Tomblin said, I had an extra new line character in the field above. So using what tommieb75 said I used:
result = fscanf(fp, "%s\n", ap->code);
result = fscanf(fp, "%[^\n]s", ap->name);
And this fixed it!
Thanks for your help.
Related
I'm trying to read a string which consists of a set of numbers followed by a string, wrapped with some other basic text.
In other words, the format of the line is something like this:
Stuff<5,10,-5,8,"Test string here.">
Naively, I tried:
sscanf(str,"Stuff<%d,%d,%d,%d,\"%s\">",&i1,&i2,&i3,&i4,str2);
But after some research I discovered %s is supposed to stop parsing when it gets to a whitespace character. I found this question, but none of the answers addresses the problem I have: the string could contain any character in it, including newline characters and properly escaped quotes. The latter is not a problem, if I can just get sscanf to put everything after the first quote in the pre-allocated buffer I provide, I can strip the end off myself.
But how do I do this? I can't use %[] because it requires something in it to terminate the string, and the only thing I want to terminate it is the null terminator. So I thought, "Hey, I'll just use the null terminator!" But %[\0] made the compiler grumpy:
warning: no closing ‘]’ for ‘%[’ format
warning: embedded ‘\0’ in format
warning: no closing ‘]’ for ‘%[’ format
warning: embedded ‘\0’ in format
Using something like %*c won't work either, because I don't know exactly how many characters need to be taken. I tried passing strlen(str) since it will be less than that, but sscanf returns 4 and nothing is put into str2, suggesting that perhaps because the length was too long it gave up and didn't bother.
Update: I guess I could do something like:
sscanf(str,"Stuff<%d,%d,%d,%d,\"%n",&i1,&i2,&i3,&i4,&n);
str2 = str+n;
Your update seems to be a good answer. I was going to suggest strchr to find the location of the first quote char, after using sscanf to get i1 thru i4. Side note, you should always check the return value from sscanf to make sure that the conversions worked. This is even more important with your suggested answer, since n will be left uninitialized if the first four conversions aren't successful.
Scan for '\"', then for everything not '\"', then '\"' again.
Be sure to check sscanf() result and limit how long the test string may be.
char test_string[100];
int n = 0;
if (sscanf(str, "Stuff<%d,%d,%d,%d, \"%99[^\"]\"> %n",
&i1, &i2, &i3, &i4, test_string, &n) == 5 && str[n] == '\0') Good();
Your attempt using "...%[\0]...", from sscanf() point-of-view, is "...%[".
Everything in the format from "\0" on is ignored.
Using the int n = 0, appending " %n" to the format string, appending &n to the parameters and checking str[n] == '\0' is a neat trick with sscanf() to insure the entire line parsed correctly. Note: "%n" does not add to sscanf() result.
This is not the only way to achieve what you want to achieve, but probably the neatest way to do it: You'll need to use the scansets. I won't tell you the solution directly with this answer, I'll explain how to use scansets as far as I know them, and you'll hopefully be able to do it yourself.
Scansets %[...] are like %s when it comes to assignment, they interpret values as characters and store them into character arrays. %s is whitespace-terminated, %[...] is the flexible version of that.
There are two ways of using the scanset, first one being without a preceding caret ^, second one being with a preceding caret ^.
When you use scanset without the preceding caret ^, the characters you put inside the brackets will be the only ones that will be read, stored and then left behind. As soon as scanf encounters a non-matching character, that %[...] will be over. For example:
// input: asdasdasdwasdasd
char s[100] = { 0 };
scanf( "%[das]", s );
printf( "%s", s );
// output: asdasdasd
When you use scanset with the preceding caret ^, the search is inversed. It reads, stores and leaves behind every character until it reaches any one of the characters that you've put down after the preceding caret ^. Example:
// input: abcdefgh^kekQ
char s[100] = { 0 };
scanf( "%[^Q^]", s );
printf( "%s", s );
// output: abcdefgh
Beware, remaining characters is still to be read inside the stream, the file pointer won't get beyond the character which caused termination. I.e. for the first one, getchar( ); would give a 'w', and for the second one it would give a '^'.
I hope this will be enough. If you still cannot find your way out, ask away, I can give you a solution.
I'm using fscanf and fprintf.
I tried to delimit the strings on each line by \t and to read it like so:
fscanf(fp,"%d\t%s\t%s",&t->num,&t->string1,&t->string2);
The file contents:
1[TAB]string1[TAB]some string[NEWLINE]
It does not read properly. If I printf("%d %s %s",t->num,t->string1,t->string2) I get:
1 string1 some
Also I get this compile warning:
warning: format specifies type 'char *' but the argument has type 'char (*)[15]' [-Wformat]
How can I fix this without using binary r/w?
I'm guessing the space in "some string" is the problem. fscanf() reading a string using %s stops at the first whitespace character. To include spaces, use something like:
fscanf(fp, "%d\t%[^\n\t]\t%[^\n\t]", &t->num, &t->string1, &t->string2);
See also a reference page for fscanf() and/or another StackOverflow thread on reading tab-delimited items in C.
[EDIT in response to your edit: You seem to also have a problem with the arguments you're passing into fscanf(). You will need to post the declarations of t->string1 to be sure, but it looks like string1 is an array of characters, and therefore you should remove the & from the fscanf() call...]
The %s conversion specification stops reading at the first white space, and tabs and blanks both count as white space.
If you want to read a string of non-tabs, you can use a 'scan set' conversion specifier:
if (fscanf(fp, "%d\t%[^\t\n]\t%[^\t\n]", &t->num, t->string1, t->string2) != 3)
...oops - format error in input data...
(I'd lay odds that omitting the & from the string arguments is correct.) The question was edited; I win. Dropping the & is necessary to avoid the compiler warning!
This still doesn't quite do what you expect. If there are blanks at the start of the second field, they'll be eaten by the \t in the format string. Any white space in the format string eats any white space (including newlines) in the input. The %[^\t] conversion specification won't get started until there's a character that isn't white space in the input. I'm also assuming you want your input limited by newlines. You can leave out the \n characters if you prefer.
Note that I checked that the fscanf() interpreted 3 fields. It is important to error check your inputs.
If you really want control, you should probably read whole lines with fgets() and then use sscanf() to parse the data.
About fgets() and sscanf(); can you expand about how it will give more control?
Suppose the input data is written
1234
a string with spaces
another string
spread out over multiple lines like that. With raw fscanf(), this will be acceptable input even though it is spread over 9 lines of input. With fgets(), you can read a single line, and then analyze it with sscanf(), and you'll know that the first line was not in the correct format. You can then decide what to do about it.
Also, since mafso called me on it in his comment, we should ensure that there are no buffer overflows by limiting the size of the strings that the scan sets match.
if (fscanf(fp, "%d\t%14[^\t\n]\t%14[^\t\n]", &t->num, t->string1, t->string2) != 3)
...oops - format error in input data...
I'm using the error message about char (*)[15] to deduce that 14 is the correct number to use. Note that unlike printf(), you can't specify the sizes via * notation (in the scanf()-family, * supresses assignment), so you have to create the format with the correct sizes. Further, the size you specify is the number of characters before the terminating null byte, so if the array is of size 15, the size you specify in the format string is 14, as shown.
I have been told that scanf should not be used when user inputs a string. Instead, go for gets() by most of the experts and also the users on StackOverflow. I never asked it on StackOverflow why one should not use scanf over gets for strings. This is not the actual question but answer to this question is greatly appreciated.
Now coming to the actual question. I came across this type of code -
scanf("%[^\n]s",a);
This reads a string until user inputs a new line character, considering the white spaces also as string.
Is there any problem if I use
scanf("%[^\n]s",a);
instead of gets?
Is gets more optimized than scanf function as it sounds, gets is purely dedicated to handle strings. Please let me know about this.
Update
This link helped me to understand it better.
gets(3) is dangerous and should be avoided at all costs. I cannot envision a use where gets(3) is not a security flaw.
scanf(3)'s %s is also dangerous -- you must use the "field width" specifier to indicate the size of the buffer you have allocated. Without the field width, this routine is as dangerous as gets(3):
char name[64];
scanf("%63s", name);
The GNU C library provides the a modifier to %s that allocates the buffer for you. This non-portable extension is probably less difficult to use correctly:
The GNU C library supports a nonstandard extension that
causes the library to dynamically allocate a string of
sufficient size for input strings for the %s and %a[range]
conversion specifiers. To make use of this feature, specify
a as a length modifier (thus %as or %a[range]). The caller
must free(3) the returned string, as in the following
example:
char *p;
int n;
errno = 0;
n = scanf("%a[a-z]", &p);
if (n == 1) {
printf("read: %s\n", p);
free(p);
} else if (errno != 0) {
perror("scanf");
} else {
fprintf(stderr, "No matching characters\n"):
}
As shown in the above example, it is only necessary to call
free(3) if the scanf() call successfully read a string.
Firstly, it is not clear what that s is doing in your format string. The %[^\n] part is a self-sufficient format specifier. It is not a modifier for %s format, as you seem to believe. This means that "%[^\n]s" format string will be interpreted by scanf as two independent format specifiers: %[^\n] followed by a lone s. This will direct scanf to read everything until \n is encountered (leaving \n unread), and then require that the next input character is s. This just doesn't make any sense. No input will match such self-contradictory format.
Secondly, what was apparently meant is scanf("%[^\n]", a). This is somewhat close to [no longer available] gets (or fgets), but it is not the same. scanf requires that each format specifiers matches at least one input character. scanf will fail and abort if it cannot match any input characters for the requested format specifier. This means that scanf("%[^\n]",a) is not capable of reading empty input lines, i.e. lines that contain \n character immediately. If you feed such a line into the above scanf, it will return 0 to indicate failure and leave a unchanged. That's very different from how typical line-based input functions work.
(This is a rather surprising and seemingly illogical properly of %[] format. Personally, I'd prefer %[] to be able to match empty sequences and produce empty strings, but that's not how standard scanf works.)
If you want to read the input in line-by-lane fashion, fgets is your best option.
I have been struggling to figure out the fscanf formatting. I just want to read in a file of words delimited by spaces. And I want to discard any strings that contain non-alphabetic characters.
char temp_text[100];
while(fscanf(fcorpus, "%101[a-zA-Z]s", temp_text) == 1) {
printf("%s\n", temp_text);
}
I've tried the above code both with and without the 's'. I read in another stackoverflow thread that the s when used like that will be interpreted as a literal 's' and not as a string. Either way - when I include the s and when I do not include the s - I can only get the first word from the file I am reading through to print out.
The %[ scan specifier does not skip leading spaces. Either add a space before it or at the end in place of your s. Also you have your 100 and 101 backwards and thus a serious buffer overflow bug.
The s isn't needed.
Here are a few things to try:
Print out the return value from fscanf, and make sure it is 1.
Make sure that the fscanf is consuming the whitespace by using fgetc to get the next character and printing it out.
Let's say that I expect a list of items from the standard input which are separated buy commas, like this:
item1, item2, item3,...,itemn
and I also want to permit the user to emit white-spaces between items and commas, so this kind of input is legal in my program:
item1,item2,item3,...,itemn
If I use scanf like this:
scanf("%s,%s,%s,%s,...,%s", s1, s2, s3, s4,...,sn);
it will fail when there are no white-spaces (I tested it) because it will refer to the whole input as one string. So how can I solve this problem only with C standard library functions?
The quick answer is never, ever use scanf to read user input. It is intended for reading strictly formatted input from files, and even then isn't much good. At the least, you should be reading entire lines and then parsing them with sscanf(), which gives you some chance to correct errors. at best you should be writing your own parsing functions
If you are actually using C++, investigate the use of the c++ string and stream classes, which are much more powerful and safe.
You could have a look at strtok. First read the line into a buffer, then tokenize:
const int BUFFERSIZE = 32768;
char buffer[BUFFERSIZE];
fgets(buffer, sizeof(buffer), stdin);
const char* delimiters = " ,\n";
char* p = strtok(buffer, delimiters);
while (p != NULL)
{
printf("%s\n", pch);
p = strtok(NULL, delimiters);
}
However, with strtok you'll need to be aware of the potential issues related to reentrance.
I guess it is better to write your own parsing function for this. But if you still prefer scanf despite of its pitfalls, you can do some workaround, just substitute %s with %[^, \t\r\n].
The problem that %s match sequence of non white space characters, so it swallows comma too. So if you replace %s with %[^, \t\r\n] it will work almost the same (difference is that %s uses isspace(3) to match space characters but in this case you explicitly specify which space characters to match and this list probably not the same as for isspace).
Please note, if you want to allow spaces before and after comma you must add white space to your format string. Format string "%[^, \t\r\n] , %[^, \t\r\n]" matches strings like "hello,world", "hello, world", "hello , world".