In my program, I use sscanf to check whether a string is of a given format. To do so, I provide the number of arguments in the format string and check whether sscanf returns that same number when parsing the input.
As part of a primitive parser, I want to check whether a string matches one of many formats. The sscanf function is variadic, so how do I deal with the varying number of arguments I need to pass?
Currently, I just pass a very large number of arguments (e.g. 50) to the function, and just hope that the format strings don't contain more arguments.
Is there any better way to do this?
You really need something heavier than scanf. You have to tell scanf what format your input is in; it can't figure anything out on its own.
If you have access to POSIX, look at regex.h it's probably everything you need.
Otherwise, you're stuck rolling your own. lex and yacc are nice if the format is rather complex, but otherwise, either strtok or (getchar+switch) is probably the way to go.
Edit:
Since you can use POSIX, here's an simple example of how to extract data from a regex in c. (error checking excluded for brevity.)
char txt[] = "232343341235898dfsfgs/.f";
regex_t reg;
regmatch_t refs[MAX_REFS]; //as in, the maximum number of data you want to extract
regcomp(®, "3433\\([0-5]*\\).*", 0); //replace 0 with REG_EXTENDED if desired
regexec(®, txt, MAX_REFS, refs, 0);
regfree(®);
txt[refs[0].rm_eo+1] = '\0';
int n = atoi(txt+refs[0].rm_so);
printf("%d\n", n);
Prints
41235
You should probably use lex/yacc to build a proper parser. Alternatively, first tokenizing the string with strtok might simplify your problem. (Beware: It is really tricky to use strtok correctly -- read its documentation very carefully.)
I'm not sure it answers your question, but you use varargs in C to allow a variable number of arguments to a function.
void myscanf(const char *fmt, ...)
{
}
The unhelpful answer is "don't do that, write a parser properly, maybe using lex and/or yacc or bison".
The answer to the question you asked is "yes, you could do that". I don't believe there's any reason why there can't be more variadic parameters than the format requires, although to few would be a bad thing. I'm presuming that you have an array or list of possible formats and you're calling sscanf in a loop.
You can write a validation function using the variable length arguments using the macros available in stdarg.h.
For example,
int my_validation_func(const char *format, ...) {
va_list ap;
char *p, *sval;
int ival;
float fval;
va_start(ap, format);
for(p=format; *p ; p++) {
if (*p != '%') {
continue;
}
switch(*++p) {
case 'd':
ival = va_arg(ap, int);
break;
case 'f':
fval = va_arg(ap, float);
break;
case 's':
for (sval = va_arg(ap, char *); *sval; sval++);
break;
default:
break;
}
}
va_end(ap);
}
Hope this helps!
If you don't know when you're writing the code the number and type(s) of the arguments, sscanf() cannot safely do what you're trying to do.
Passing 50 arguments to sscanf() is ok (arguments not consumed by the format string are evaluated but otherwise ignored), but the arguments that correspond to the format string have to be of the expected type, after promotion; otherwise, the behavior is undefined. So if you want to detect whether a string can be scanned with either "%d" or "%f", you can't safely do it with a single sscanf() call. (It's likely you could get away with passing a void* that points to a sufficiently large buffer, but the behavior is still undefined.)
Another nasty problem with sscanf() is that it doesn't handle numeric overflow. This:
char *s = "9999999999999999999999999";
int n;
int result = sscanf(s, "%d", &n);
printf("result = %d, n = %d\n", result, n);
has undefined behavior (assuming 9999999999999999999999999 is too big to be stored in an int).
Something you might be able to do is find an open-source sscanf implementation and modify it so it just verifies the string against the format, without storing anything. (Dealing with the license for the implementation is left as an exercise.) This makes sense if you find sscanf-style format strings particularly convenient for your problem. Otherwise, regular expressions are probably the way to go (not in the C standard, but it's easy enough to find an implementation).
Related
I want to implement a wrapper function for C sscanf without using vsscanf, because in my environment vsscanf() is not there only sscanf is there. I don't want to do a complete implementation of sscanf also because for that I need to consider all possible scenarios. I have seen some samples in google, but it has not considered all scenarios.
So now I want to implement like below:
int my_sscanf(char * buf, char format[], ...)
{
va_list vargs = {0};
va_start(vargs, format);
//some loop to get the variable aguments
//and call again sscanf() here.
va_end (vargs);
}
Ouch! Here's a hammer; it'll be more fun hitting yourself on the head with it. Seriously, that's a non-trivial proposition.
You'll need a loop that scans through the format string, reading characters from the buffer when they're normal characters, remembering that spaces in the format chew up zero or more spaces in the buffer. When you encounter a conversion specification, you'll need to create a singleton format string containing the user-supplied conversion specification plus a %n conversion specification. You'll invoke:
int pos;
int rc = sscanf(current_pos_in_buf, manufactured_format_with_percent_n,
appropriate_pointer_from_varargs, &pos);
If rc is not 1, you'll fail. Otherwise, you update the current position in the buffer using the value stored in pos, and then repeat. Note that scanning a conversion specification is not trivial. Also, if there is an assignment-suppressing * in the specification, you'll have to expect a 0 back from sscanf() (and not provide the appropriate pointer from the variable args).
Try telling your compiler to compile your code as C99. If that still doesn't work, your libc does not comply with the C99 standard – in that case, get a proper libc.
E.g. if you're using gcc, try adding -std=c99 to the compiler command line.
There's a slightly simpler way to do this using the preprocessor, but it's a little hacky. Take this as an example:
#define my_sscanf(buf, fmt, ...) { \
do_something(); \
sscanf((buf), (fmt), __VA_ARGS__); \
do_something_else(); }
I have the following code :
void test(int N)
{
printf("%d", N);
}
int main(int ac, char **av)
{
test("");
return 0;
}
I have a function test that expects an integer argument, but in the main when I call the function I give a string argument and c converts it to a integer and prints it for me. But what I want is that if someone passes a string than I give an error. How to check whether the argument is a string though ?
Thanks !
void test(int N) { /* ... */ }
...
test("");
That function call is simply invalid. test requires an argument of type int, or of something that's implicitly convertible to int (any arithmetic type will do). "" is a string literal; in this context, it's converted to a char* value which points to the '\0' character which is the first (and last, and only) character of the array.
There is no implicit conversion from char* to int. A conforming compiler must issue a diagnostic for the invalid call, and it may (and IMHO should) reject it outright. It's exactly as invalid as trying to take the square root of a string literal, or add 42 to a structure.
Older versions of C (before the 1989 ANSI standard) were more lax about this kind of thing, and that laxity survives into some modern compilers. It's likely that, if your compiler doesn't reject the call, it will take the address of the string literal and convert it to an int. The result of this conversion is largely meaningless; such a compiler really isn't doing you any favors by permitting it.
If your compiler doesn't reject, or at the very least warn about, the call, you should enable whatever options are necessary to make it do so. For gcc, for example, you might try something like:
gcc -std=c99 -pedantic -Wall -Wextra filename.c
You can drop the -pedantic if you want to use gcc-specific extensions. There are several possible arguments for the -std= option. See the gcc documentation for more information -- or the documentation for whatever compiler you're using.
If you're asking about validating user input (i.e., input from someone running your program rather than writing C code), user input is not generally in the form of numbers. It's in the form of text, sequences of characters. For example, you might use the fgets() function to read a line of text from standard input. You can then, if you like, check whether that line has the form of an integer literal. One way to do that is to use the sscanf function. A quick example:
#include <stdio.h>
int main(void) {
char line[200];
int n;
printf("Enter an integer: ");
fflush(stdout);
fgets(line, sizeof line, stdin);
if (sscanf(line, "%d", &n) == 1) {
printf("You entered %d (0x%x)\n", n, (unsigned)n);
}
else {
printf("You did not enter an integer\n");
}
}
But if your question is about someone writing C code that calls a function you provide, the compiler will check that any arguments are of a valid type.
what I want is that if someone passes a string than I give an error
That's not really your problem. Most compilers will give a warning for this, I think -- presuming warnings are enabled.
The issue is that C always passes by value, but a string argument is an array, and its value is the address of the first character in the array -- a pointer, but pointer values can be treated as integers. Again, most compilers are smart enough to catch this ambiguity, if you use them properly.
You can't completely bulletproof your code against people who use it improperly. You write an API, you document it, but you don't have to cover cases for those who cannot use basic tools properly.
The core standard C library does not include checks of the sort you are looking for here, so it seems pointless to incorporate them into your API -- there are oodles of built-in standard commands with int args to which an array can be passed in the same way. Saving someone from doing something stupid with your library won't save them from doing the exact same thing with the base C lib -- i.e., you can't stop them from passing pointers in place of ints. Period.
The naive approach is something like this:
int is_integer( const char *s )
{
if( *s == '-' || *s == '+' ) s++;
while( isdigit(*s) ) s++;
return *s == 0;
}
That will tell you if all characters are digits, with an optional sign. It's not particularly robust, however. It can't handle whitespace, and it doesn't check the integer is in the valid range.
However, it might be enough for your needs.
Example:
int main( int argc, char **argv )
{
int val;
if( argc <= 1 || !is_integer(argv[1]) ) {
fprintf( stderr, "Syntax: %s val\n\nWhere val is an integer\n", argv[0] );
return 1;
}
val = strtol( argv[1], NULL, 10 );
test(val);
return 0;
}
Compile with -Wall and -Werror and your problem will magically go away.
gcc -Wall -Werror file.c
I have this snippet of the code:
char* receiveInput(){
char *s;
scanf("%s",s);
return s;
}
int main()
{
char *str = receiveInput();
int length = strlen(str);
printf("Your string is %s, length is %d\n", str, length);
return 0;
}
I receive this output:
Your string is hellàÿ", length is 11
my input was:
helloworld!
can somebody explain why, and why this style of the coding is bad, thanks in advance
Several questions have addressed what you've done wrong and how to fix it, but you also said (emphasis mine):
can somebody explain why, and why this style of the coding is bad
I think scanf is a terrible way to read input. It's inconsistent with printf, makes it easy to forget to check for errors, makes it hard to recover from errors, and is incompatable with ordinary (and easier to do correctly) read operations (like fgets and company).
First, note that the "%s" format will read only until it sees whitespace. Why whitespace? Why does "%s" print out an entire string, but reads in strings in such a limited capacity?
If you'd like to read in an entire line, as you may often be wont to do, scanf provides... with "%[^\n]". What? What is that? When did this become Perl?
But the real problem is that neither of those are safe. They both freely overflow with no bounds checking. Want bounds checking? Okay, you got it: "%10s" (and "%10[^\n]" is starting to look even worse). That will only read 9 characters, and add a terminating nul-character automatically. So that's good... for when our array size never needs to change.
What if we want to pass the size of our array as an argument to scanf? printf can do this:
char string[] = "Hello, world!";
printf("%.*s\n", sizeof string, string); // prints whole message;
printf("%.*s\n", 6, string); // prints just "Hello,"
Want to do the same thing with scanf? Here's how:
static char tmp[/*bit twiddling to get the log10 of SIZE_MAX plus a few*/];
// if we did the math right we shouldn't need to use snprintf
snprintf(tmp, sizeof tmp, "%%%us", bufsize);
scanf(tmp, buffer);
That's right - scanf doesn't support the "%.*s" variable precision printf does, so to do dynamic bounds checking with scanf we have to construct our own format string in a temporary buffer. This is all kinds of bad, and even though it's actually safe here it will look like a really bad idea to anyone just dropping in.
Meanwhile, let's look at another world. Let's look at the world of fgets. Here's how we read in a line of data with fgets:
fgets(buffer, bufsize, stdin);
Infinitely less headache, no wasted processor time converting an integer precision into a string that will only be reparsed by the library back into an integer, and all the relevant elements are sitting there on one line for us to see how they work together.
Granted, this may not read an entire line. It will only read an entire line if the line is shorter than bufsize - 1 characters. Here's how we can read an entire line:
char *readline(FILE *file)
{
size_t size = 80; // start off small
size_t curr = 0;
char *buffer = malloc(size);
while(fgets(buffer + curr, size - curr, file))
{
if(strchr(buffer + curr, '\n')) return buffer; // success
curr = size - 1;
size *= 2;
char *tmp = realloc(buffer, size);
if(tmp == NULL) /* handle error */;
buffer = tmp;
}
/* handle error */;
}
The curr variable is an optimization to prevent us from rechecking data we've already read, and is unnecessary (although useful as we read more data). We could even use the return value of strchr to strip off the ending "\n" character if you preferred.
Notice also that size_t size = 80; as a starting place is completely arbitrary. We could use 81, or 79, or 100, or add it as a user-supplied argument to the function. We could even add an int (*inc)(int) argument, and change size *= 2; to size = inc(size);, allowing the user to control how fast the array grows. These can be useful for efficiency, when reallocations get costly and boatloads of lines of data need to be read and processed.
We could write the same with scanf, but think of how many times we'd have to rewrite the format string. We could limit it to a constant increment, instead of the doubling (easily) implemented above, and never have to adjust the format string; we could give in and just store the number, do the math with as above, and use snprintf to convert it to a format string every time we reallocate so that scanf can convert it back to the same number; we could limit our growth and starting position in such a way that we can manually adjust the format string (say, just increment the digits), but this could get hairy after a while and may require recursion (!) to work cleanly.
Furthermore, it's hard to mix reading with scanf with reading with other functions. Why? Say you want to read an integer from a line, then read a string from the next line. You try this:
int i;
char buf[BUSIZE];
scanf("%i", &i);
fgets(buf, BUFSIZE, stdin);
That will read the "2" but then fgets will read an empty line because scanf didn't read the newline! Okay, take two:
...
scanf("%i\n", &i);
...
You think this eats up the newline, and it does - but it also eats up leading whitespace on the next line, because scanf can't tell the difference between newlines and other forms of whitespace. (Also, turns out you're writing a Python parser, and leading whitespace in lines is important.) To make this work, you have to call getchar or something to read in the newline and throw it away it:
...
scanf("%i", &i);
getchar();
...
Isn't that silly? What happens if you use scanf in a function, but don't call getchar because you don't know whether the next read is going to be scanf or something saner (or whether or not the next character is even going to be a newline)? Suddenly the best way to handle the situation seems to be to pick one or the other: do we use scanf exclusively and never have access to fgets-style full-control input, or do we use fgets exclusively and make it harder to perform complex parsing?
Actually, the answer is we don't. We use fgets (or non-scanf functions) exclusively, and when we need scanf-like functionality, we just call sscanf on the strings! We don't need to have scanf mucking up our filestreams unnecessarily! We can have all the precise control over our input we want and still get all the functionality of scanf formatting. And even if we couldn't, many scanf format options have near-direct corresponding functions in the standard library, like the infinitely more flexible strtol and strtod functions (and friends). Plus, i = strtoumax(str, NULL) for C99 sized integer types is a lot cleaner looking than scanf("%" SCNuMAX, &i);, and a lot safer (we can use that strtoumax line unchanged for smaller types and let the implicit conversion handle the extra bits, but with scanf we have to make a temporary uintmax_t to read into).
The moral of this story: avoid scanf. If you need the formatting it provides, and don't want to (or can't) do it (more efficiently) yourself, use fgets / sscanf.
scanf doesn't allocate memory for you.
You need to allocate memory for the variable passed to scanf.
You could do like this:
char* receiveInput(){
char *s = (char*) malloc( 100 );
scanf("%s",s);
return s;
}
But warning:
the function that calls receiveInput will take the ownership of the returned memory: you'll have to free(str) after you print it in main. (Giving the ownership away in this way is usually not considered a good practice).
An easy fix is getting the allocated memory as a parameter.
if the input string is longer than 99 (in my case) your program will suffer of buffer overflow (which is what it's already happening).
An easy fix is to pass to scanf the length of your buffer:
scanf("%99s",s);
A fixed code could be like this:
// s must be of at least 100 chars!!!
char* receiveInput( char *s ){
scanf("%99s",s);
return s;
}
int main()
{
char str[100];
receiveInput( str );
int length = strlen(str);
printf("Your string is %s, length is %d\n", str, length);
return 0;
}
You have to first allocate memory to your s object in your receiveInput() method. Such as:
s = (char *)calloc(50, sizeof(char));
What is the use of the %n format specifier in C? Could anyone explain with an example?
Most of these answers explain what %n does (which is to print nothing and to write the number of characters printed thus far to an int variable), but so far no one has really given an example of what use it has. Here is one:
int n;
printf("%s: %nFoo\n", "hello", &n);
printf("%*sBar\n", n, "");
will print:
hello: Foo
Bar
with Foo and Bar aligned. (It's trivial to do that without using %n for this particular example, and in general one always could break up that first printf call:
int n = printf("%s: ", "hello");
printf("Foo\n");
printf("%*sBar\n", n, "");
Whether the slightly added convenience is worth using something esoteric like %n (and possibly introducing errors) is open to debate.)
Nothing printed. The argument must be a pointer to a signed int, where the number of characters written so far is stored.
#include <stdio.h>
int main()
{
int val;
printf("blah %n blah\n", &val);
printf("val = %d\n", val);
return 0;
}
The previous code prints:
blah blah
val = 5
I haven't really seen many practical real world uses of the %n specifier, but I remember that it was used in oldschool printf vulnerabilities with a format string attack quite a while back.
Something that went like this
void authorizeUser( char * username, char * password){
...code here setting authorized to false...
printf(username);
if ( authorized ) {
giveControl(username);
}
}
where a malicious user could take advantage of the username parameter getting passed into printf as the format string and use a combination of %d, %c or w/e to go through the call stack and then modify the variable authorized to a true value.
Yeah it's an esoteric use, but always useful to know when writing a daemon to avoid security holes? :D
From here we see that it stores the number of characters printed so far.
n The argument shall be a pointer to an integer into which is written the number of bytes written to the output so far by this call to one of the fprintf() functions. No argument is converted.
An example usage would be:
int n_chars = 0;
printf("Hello, World%n", &n_chars);
n_chars would then have a value of 12.
So far all the answers are about that %n does, but not why anyone would want it in the first place. I find it's somewhat useful with sprintf/snprintf, when you might need to later break up or modify the resulting string, since the value stored is an array index into the resulting string. This application is a lot more useful, however, with sscanf, especially since functions in the scanf family don't return the number of chars processed but the number of fields.
Another really hackish use is getting a pseudo-log10 for free at the same time while printing a number as part of another operation.
The argument associated with the %n will be treated as an int* and is filled with the number of total characters printed at that point in the printf.
The other day I found myself in a situation where %n would nicely solve my problem. Unlike my earlier answer, in this case, I cannot devise a good alternative.
I have a GUI control that displays some specified text. This control can display part of that text in bold (or in italics, or underlined, etc.), and I can specify which part by specifying starting and ending character indices.
In my case, I am generating the text to the control with snprintf, and I'd like one of the substitutions to be made bold. Finding the starting and ending indices to this substitution is non-trivial because:
The string contains multiple substitutions, and one of the substitutions is arbitrary, user-specified text. This means that doing a textual search for the substitution I care about is potentially ambiguous.
The format string might be localized, and it might use the $ POSIX extension for positional format specifiers. Therefore searching the original format string for the format specifiers themselves is non-trivial.
The localization aspect also means that I cannot easily break up the format string into multiple calls to snprintf.
Therefore the most straightforward way to find the indices around a particular substitution would be to do:
char buf[256];
int start;
int end;
snprintf(buf, sizeof buf,
"blah blah %s %f yada yada %n%s%n yakety yak",
someUserSpecifiedString,
someFloat,
&start, boldString, &end);
control->set_text(buf);
control->set_bold(start, end);
It doesn't print anything. It is used to figure out how many characters got printed before %n appeared in the format string, and output that to the provided int:
#include <stdio.h>
int main(int argc, char* argv[])
{
int resultOfNSpecifier = 0;
_set_printf_count_output(1); /* Required in visual studio */
printf("Some format string%n\n", &resultOfNSpecifier);
printf("Count of chars before the %%n: %d\n", resultOfNSpecifier);
return 0;
}
(Documentation for _set_printf_count_output)
It will store value of number of characters printed so far in that printf() function.
Example:
int a;
printf("Hello World %n \n", &a);
printf("Characters printed so far = %d",a);
The output of this program will be
Hello World
Characters printed so far = 12
Those who want to use %n Format Specifier may want to look at this:
Do Not Use the "%n" Format String Specifier
In C, use of the "%n" format specification in printf() and sprintf()
type functions can change memory values. Inappropriate
design/implementation of these formats can lead to a vulnerability
generated by changes in memory content. Many format vulnerabilities,
particularly those with specifiers other than "%n", lead to
traditional failures such as segmentation fault. The "%n" specifier
has generated more damaging vulnerabilities. The "%n" vulnerabilities
may have secondary impacts, since they can also be a significant
consumer of computing and networking resources because large
guantities of data may have to be transferred to generate the desired
pointer value for the exploit. Avoid using the "%n" format
specifier. Use other means to accomplish your purpose.
Source: link
In my opinion, %n in 1st argument of print function simply record the number of character it prints on the screen before it reach the the %n format code including white spaces and new line character.`
#include <stdio.h>
int main()
{
int i;
printf("%d %f\n%n", 100, 123.23, &i);
printf("%d'th characters printed on the screen before '%%n'", i);
}
output:
100 123.230000
15'th characters printed on the screen before '%n'(with new character).
We can assign the of i in an another way...
As we know the argument of print function:-
int printf(char *control-string, ...);
So, it returns the number the number of characters output. We can assign that return value to i.
#include <stdio.h>
int main()
{
int i;
i = printf("%d %f\n", 100, 123.23);
printf("%d'th characters printed on the screen.", i);
}
%n is C99, works not with VC++.
I'm creating a function in C to convert an index value into a string, which is a verbose description of the "field" represented by the index.
So, I have a nice array with all the verbose descriptions indexed by, well the index.
To dump it into a buffer I use code like this
#define BUFFER_SIZE 40
void format_verbose(uint32_t my_index,
char my_buffer[BUFFER_SIZE])
{
snprintf(mY_buffer, BUFFER_SIZE, "%s", MY_ARRAY[my_index].description);
}
The problem comes for some cases I need to insert some other strings into the string when formatting it. So what I want is something like this (where the description in this case contains a %s).
void format_verbose_with_data(uint32_t my_index,
char my_buffer[BUFFER_SIZE])
{
// ...
snprintf(mY_buffer, BUFFER_SIZE, MY_ARRAY[my_index].description,
some_string);
}
Our make file is set up to make this (dangerous) use of snprintf() warn, and warnings are treated as errors. So, it won't compile. I would like to turn off the warning for just this line, where although it is somewhat dangerous, I will control the string, and I can test to ensure it works with every value it's called with.
Alternatively, I would be happy to do this some other way, but I'm really not keen to use this solution
void format_verbose_with_data(uint32_t my_index,
char my_buffer[BUFFER_SIZE])
{
// ...
snprintf(mY_buffer, BUFFER_SIZE, "%s%s%s"
MY_ARRAY[my_index].description1, some_string,
MY_ARRAY[my_index].description2);
}
Because it makes my description array ugly, especially for the ones where I don't need to add extra values.
GCC doesn't have the ability to turn off warnings on a line by line basis, so I suspect you are out of luck. And anyway, if your coding standards say you shouldn't be doing something, you should not be looking for ways to defeat them.
Another point, when you say:
void format_verbose(uint32_t my_index,
char my_buffer[BUFFER_SIZE])
you are really wasting your time typing - it is clearer and more idiomatic to say:
void format_verbose(uint32_t my_index,
char my_buffer[])
or:
void format_verbose(uint32_t my_index,
char * my_buffer)
If all you ever do with snprintf() is copying strings (as seems to be the case), i.e. all you ever have is one or more "%s" as format string, why are you not using strcpy(), perhaps with a strlen() first to check the source's length?
Note that strncpy(), while looking like a good idea, isn't. It always pads the target buffer with zeroes, and in case the source exceeds the buffer size, doesn't null-terminate the string.
After a night of thought, I plan on manually dividing the input string up, by looking for the %s marker myself, and then sending the strings separetley into snprintf() with their own %s. For this case, where only 1 type of format string is allowed, this less onerous, however it would suck to try to completely re-implement a printf() style parser.
void format_verbose_with_data(uint32_t my_index,
char my_buffer[BUFFER_SIZE])
{
char pre_description[BUFFER_SIZE];
char post_description[BUFFER_SIZE];
int32_t offset = -1;
offset = find_string(MY_ARRAY[my_index].description, "%s");
ASSERT(offset >=0, "No split location!");
// Use offset to copy the pre and post descriptions
// Exercise left to the reader :-)
snprintf(mY_buffer, BUFFER_SIZE, "%s%s%s"
pre_description, some_string, post_description);
}