My question is how do I create a format string that will include an entire line of text including embedded spaces?
I have a buffer defined
char buf[380];
When I read in a line of text, I make sure that buf[378] = '\n'; and `buf[379] = '\0';'
I am editing that buffer and then writing it to an already-opened file using fprintf:
fprintf(outfile, buf);
I am getting this warning when compiling:
ex_split.c: In function ‘main’:
ex_split.c:65:3: warning: format not a string literal and no format arguments [-Wformat-security]
and am not quite sure how to write the format string so all of buf can be written, because, as I remember it %s will stop at the first space, and this data has lots of spaces in it, which I need. It is positionally formatted data.
Addendum
Let me restate the problem. I believe %s will cause fprintf to stop scanning at the first 0x20 character in each line of data, in which there are one or more and usually many space and non-space, readable ASCII data. Are my assumptions correct?
%s stops at a space when scanning as part of a scanf format, not when printing as part of a printf format. fprintf(outfile, "%s", buf); will work just fine for you.
If you're sure that your buf string is safe - that is, it doesn't contain any % escapes that might cause undefined behaviour in printf, you could also just ignore/disable the warning in this case.
Change
fprintf(outfile, buf);
to
fprintf(outfile, "%s", buf);
and it will work as expected.
Related
I want to fscanf a csv file which is a output of fprintf, I set the same format but it didn't work, which means when I use that function to fscanf the file I just made, it didn't succuss, even didn't get into the while-loop. So, how to modify it to make it work?
Below part of my code
part of fprintf
fp = fopen("Out.csv", "w");
fprintf(fp, "%99s,%d,%99s,%99s\n", current->group, current->id, current->name,
current->address);
part of fscanf
fp = fopen("Out.csv", "r");
while (fscanf(fp, "%99s,%d,%99s,%99s\n", group, &id, name, address) == 4) {
head = push_sort(head, group, name, id, address);
printf("%99s", name);
}
I suspect it's because "%s" specifier in *scanf() family stops scanning when it finds a white space, you can tell fscanf() which specific character to ignore, and it will stop at that character.
I believe the following format string will work
"%99[^,],%d,%99[^,],%99[^,\n]\n"
read this link to find out why I think the pattern will work, search specifically for the [ specifier.
The *scanf() functions are hard, it's always difficult to make them work correctly, although if you are generating the line and you're sure of what it contains and no surprises will happen, you can trust it to work.
You will be safe if you check the return value, which you do, so if you fail to read lines that you consider valid, then you can try to fgets() a line from the file, and parse it with strchr() or strtok(), I prefer strchr() because
It doesn't need to alter the input string.
It's thread safe and reentrant.
It allows you to infere more, like the length of the token, without strlen().
I'm using fscanf and fprintf.
I tried to delimit the strings on each line by \t and to read it like so:
fscanf(fp,"%d\t%s\t%s",&t->num,&t->string1,&t->string2);
The file contents:
1[TAB]string1[TAB]some string[NEWLINE]
It does not read properly. If I printf("%d %s %s",t->num,t->string1,t->string2) I get:
1 string1 some
Also I get this compile warning:
warning: format specifies type 'char *' but the argument has type 'char (*)[15]' [-Wformat]
How can I fix this without using binary r/w?
I'm guessing the space in "some string" is the problem. fscanf() reading a string using %s stops at the first whitespace character. To include spaces, use something like:
fscanf(fp, "%d\t%[^\n\t]\t%[^\n\t]", &t->num, &t->string1, &t->string2);
See also a reference page for fscanf() and/or another StackOverflow thread on reading tab-delimited items in C.
[EDIT in response to your edit: You seem to also have a problem with the arguments you're passing into fscanf(). You will need to post the declarations of t->string1 to be sure, but it looks like string1 is an array of characters, and therefore you should remove the & from the fscanf() call...]
The %s conversion specification stops reading at the first white space, and tabs and blanks both count as white space.
If you want to read a string of non-tabs, you can use a 'scan set' conversion specifier:
if (fscanf(fp, "%d\t%[^\t\n]\t%[^\t\n]", &t->num, t->string1, t->string2) != 3)
...oops - format error in input data...
(I'd lay odds that omitting the & from the string arguments is correct.) The question was edited; I win. Dropping the & is necessary to avoid the compiler warning!
This still doesn't quite do what you expect. If there are blanks at the start of the second field, they'll be eaten by the \t in the format string. Any white space in the format string eats any white space (including newlines) in the input. The %[^\t] conversion specification won't get started until there's a character that isn't white space in the input. I'm also assuming you want your input limited by newlines. You can leave out the \n characters if you prefer.
Note that I checked that the fscanf() interpreted 3 fields. It is important to error check your inputs.
If you really want control, you should probably read whole lines with fgets() and then use sscanf() to parse the data.
About fgets() and sscanf(); can you expand about how it will give more control?
Suppose the input data is written
1234
a string with spaces
another string
spread out over multiple lines like that. With raw fscanf(), this will be acceptable input even though it is spread over 9 lines of input. With fgets(), you can read a single line, and then analyze it with sscanf(), and you'll know that the first line was not in the correct format. You can then decide what to do about it.
Also, since mafso called me on it in his comment, we should ensure that there are no buffer overflows by limiting the size of the strings that the scan sets match.
if (fscanf(fp, "%d\t%14[^\t\n]\t%14[^\t\n]", &t->num, t->string1, t->string2) != 3)
...oops - format error in input data...
I'm using the error message about char (*)[15] to deduce that 14 is the correct number to use. Note that unlike printf(), you can't specify the sizes via * notation (in the scanf()-family, * supresses assignment), so you have to create the format with the correct sizes. Further, the size you specify is the number of characters before the terminating null byte, so if the array is of size 15, the size you specify in the format string is 14, as shown.
I have something like this
char string[]="My name is %s";
printf("%s",string,"Tom");
I expect the output to be My name is Tom but I am getting My name is %s
I tried escaping the % character, but it dint work.
What's going wrong here? How shall I have a %s inside a %s?
Try something like this
printf(string,"Tom");
The problem with your method is this -
As soon as printf sees %s format specifier in your format string, it assumes that the next argument in list is a string pointed to by a character pointer, retrieves it and prints it in place of %s. Now, printf doesn't do a recursive replacement and hence
%s inside your string remains as it is and "Tom" is now an extra argument which is just discarded.
There is only one expansion during printf; that means any strings passed to printf except the format strings will be printed verbatim (if at all). That is actually a good thing, because otherwise, it leaves a huge security hole.
The security risk relates to the fact that the format string and the parameter list have to correspond. That means, if an unwanted % makes it to the format string, you will get in trouble:
char ch[50];
fgets(ch, 50, stdin);
printf(ch);
If the user supplies eg. %p %p %p %p, he will be reading data stored on the stack (like the return address and so on), if he supplies %s %s %s, he'll likely crash the program. If he supplies %n, he'll overwrite some data on the stack.
That said, you can just compute the format string if you want:
char ch[50];
char format_prototype[]="My name is %s";
snprintf(ch, 49, "%s", format_prototype);
ch[49]=0;
printf(ch, "Tom");
printf(string, "Tom") maybe?
The problem is with printf("%s",string,"Tom");
line
You should use
char string[]="My name is %s";
printf(string,"Tom");
here you will get the output as
My name is Tom
The first parameter to printf is the format string. The rest are all parameters which will be formatted according to the format string. The format strings do not nest in any way. In other words, even if a string to be formatted happens to contain formatting instruction, it is simply printed and not interpreted as another format string.
If you want to have that kind of formatting indirection, you would have to first generate a new format string (sprintf is useful for that):
char string[] = "My name is %s";
char format[100];
sprintf(format, "%s", string);
and then use the newly generated format string:
printf(format, "Tom");
I have been told that scanf should not be used when user inputs a string. Instead, go for gets() by most of the experts and also the users on StackOverflow. I never asked it on StackOverflow why one should not use scanf over gets for strings. This is not the actual question but answer to this question is greatly appreciated.
Now coming to the actual question. I came across this type of code -
scanf("%[^\n]s",a);
This reads a string until user inputs a new line character, considering the white spaces also as string.
Is there any problem if I use
scanf("%[^\n]s",a);
instead of gets?
Is gets more optimized than scanf function as it sounds, gets is purely dedicated to handle strings. Please let me know about this.
Update
This link helped me to understand it better.
gets(3) is dangerous and should be avoided at all costs. I cannot envision a use where gets(3) is not a security flaw.
scanf(3)'s %s is also dangerous -- you must use the "field width" specifier to indicate the size of the buffer you have allocated. Without the field width, this routine is as dangerous as gets(3):
char name[64];
scanf("%63s", name);
The GNU C library provides the a modifier to %s that allocates the buffer for you. This non-portable extension is probably less difficult to use correctly:
The GNU C library supports a nonstandard extension that
causes the library to dynamically allocate a string of
sufficient size for input strings for the %s and %a[range]
conversion specifiers. To make use of this feature, specify
a as a length modifier (thus %as or %a[range]). The caller
must free(3) the returned string, as in the following
example:
char *p;
int n;
errno = 0;
n = scanf("%a[a-z]", &p);
if (n == 1) {
printf("read: %s\n", p);
free(p);
} else if (errno != 0) {
perror("scanf");
} else {
fprintf(stderr, "No matching characters\n"):
}
As shown in the above example, it is only necessary to call
free(3) if the scanf() call successfully read a string.
Firstly, it is not clear what that s is doing in your format string. The %[^\n] part is a self-sufficient format specifier. It is not a modifier for %s format, as you seem to believe. This means that "%[^\n]s" format string will be interpreted by scanf as two independent format specifiers: %[^\n] followed by a lone s. This will direct scanf to read everything until \n is encountered (leaving \n unread), and then require that the next input character is s. This just doesn't make any sense. No input will match such self-contradictory format.
Secondly, what was apparently meant is scanf("%[^\n]", a). This is somewhat close to [no longer available] gets (or fgets), but it is not the same. scanf requires that each format specifiers matches at least one input character. scanf will fail and abort if it cannot match any input characters for the requested format specifier. This means that scanf("%[^\n]",a) is not capable of reading empty input lines, i.e. lines that contain \n character immediately. If you feed such a line into the above scanf, it will return 0 to indicate failure and leave a unchanged. That's very different from how typical line-based input functions work.
(This is a rather surprising and seemingly illogical properly of %[] format. Personally, I'd prefer %[] to be able to match empty sequences and produce empty strings, but that's not how standard scanf works.)
If you want to read the input in line-by-lane fashion, fgets is your best option.
I'm reading in a .txt file. I'm using fscanf to get the data as it is formatted.
The line I'm having problems with is this:
result = fscanf(fp, "%s", ap->name);
This is fine until I have a name with a whitespace eg: St Ives
So I use this to read in the white space:
result = fscanf(fp, "%[^\n]s", ap->name);
However, when I try to read in the first name (with no white space) it just doesn't work and messes up the other fscanf.
But I use the [^\n] it works fine within a different file I'm using. Not sure what is happening.
If I use fgets in the place of the fscanf above I get "\n" in the variable.
Edit//
Ok, so if I use:
result = fscanf(fp, "%s", ap->name);
result = fscanf(fp, "%[^\n]s", ap->name);
This allows me to read in a string with no white space. But When I get a "name" with whitespace it doesn't work.
One problem with this:
result = fscanf(fp, "%[^\n]s", ap->name);
is that you have an extra s at the end of your format specifier. The entire format specifier should just be %[^\n], which says "read in a string which consists of characters which are not newlines". The extra s is not part of the format specifier, so it's interpreted as a literal: "read the next character from the input; if it's an "s", continue, otherwise fail."
The extra s doesn't actually hurt you, though. You know exactly what the next character of input: a newline. It doesn't match, and input processing stops there, but it doesn't really matter since it's the end of your format specifier. This would cause problems, though, if you had other format specifiers after this one in the same format string.
The real problem is that you're not consuming the newline: you're only reading in all of the characters up to the newline, but not the newline itself. To fix that, you should do this:
result = fscanf(fp, "%[^\n]%*c", ap->name);
The %*c specifier says to read in a character (c), but don't assign it to any variable (*). If you omitted the *, you would have to pass fscanf() another parameter containing a pointer to a character (a char*), where it would then store the resulting character that it read in.
You could also use %[^\n]\n, but that would also read in any whitespace which followed the newline, which may not be what you want. When fscanf finds whitespace in its format specifier (a space, newline, or tab), it consumes as much whitespace as it can (i.e. you can think of it consuming the longest string that matches the regular expression [ \t\n]*).
Finally, you should also specify a maximum length to avoid buffer overruns. You can do this by placing the buffer length in between the % and the [. For example, if ap->name is a buffer of 256 characters, you should do this:
result = fscanf(fp, "%255[^\n]%*c", ap->name);
This works great for statically allocated arrays; unfortunately, if the array is dyamically sized at runtime, there's no easy to way to pass the buffer size to fscanf. You'll have to create the format string with sprintf, e.g.:
char format[256];
snprintf(format, sizeof(format), "%%%d[^\n]%%*c", buffer_size - 1);
result = fscanf(fp, format, ap->name);
Jumm wrote:
If I use fgets in the place of the fscanf above I get "\n" in the variable.
Which is a far easier problem to solve so go with it:
fgets( ap->name, MAX, fp ) ;
nlptr = strrchr ( ap->name, '\n' ) ;
if( nlptr != 0 )
{
*nlptr = '\0' ;
}
I'm not sure how you mean [^\n] is suppose to work. [] is a modifier which says "accept one character except any of the characters which is inside this block". The ^ inverts the condition. %s with fscanf only reads until it comes across a delimiter. For strings with spaces and newlines in them, use a combination of fgets and sscanf instead, and specify a restriction on the length.
There is no such thing as I gather you are trying to imply a regular expression in the fscanf function which does not exist, not that to my knowledge nor have I seen it anywhere - enlighten me on this.
The format specifier for reading a string is %s, it could be that you need to do it this way, %s\n which will pick up the newline.
But for pete's sake do not use the standard old gets family functions as specified by Clifford's answer above as that is where buffer overflows happen and was used in a infamous worm of the 1990's - the Morris Worm, more specifically in the fingerd daemon, that used to call gets that caused chaos. Fortunately, now, that has now been patched. And furthermore, a lot of programmers have been drilled into the mentality not to use the function.
Even Microsoft has adopted a safe version of gets family of functions, that specifies a parameter to indicate the length of buffer instead.
EDIT
My bad - I did not realize that Clifford indeed has specified the max length for input...Whoops! Sorry! Clifford's answer is correct! So +1 to Clifford's answer.
Thanks Neil for pointing out my error...
Hope this helps,
Best regards,
Tom.
I found the problem.
As Paul Tomblin said, I had an extra new line character in the field above. So using what tommieb75 said I used:
result = fscanf(fp, "%s\n", ap->code);
result = fscanf(fp, "%[^\n]s", ap->name);
And this fixed it!
Thanks for your help.