How to implement wrapper function for C sscanf() without using vsscanf() - c

I want to implement a wrapper function for C sscanf without using vsscanf, because in my environment vsscanf() is not there only sscanf is there. I don't want to do a complete implementation of sscanf also because for that I need to consider all possible scenarios. I have seen some samples in google, but it has not considered all scenarios.
So now I want to implement like below:
int my_sscanf(char * buf, char format[], ...)
{
va_list vargs = {0};
va_start(vargs, format);
//some loop to get the variable aguments
//and call again sscanf() here.
va_end (vargs);
}

Ouch! Here's a hammer; it'll be more fun hitting yourself on the head with it. Seriously, that's a non-trivial proposition.
You'll need a loop that scans through the format string, reading characters from the buffer when they're normal characters, remembering that spaces in the format chew up zero or more spaces in the buffer. When you encounter a conversion specification, you'll need to create a singleton format string containing the user-supplied conversion specification plus a %n conversion specification. You'll invoke:
int pos;
int rc = sscanf(current_pos_in_buf, manufactured_format_with_percent_n,
appropriate_pointer_from_varargs, &pos);
If rc is not 1, you'll fail. Otherwise, you update the current position in the buffer using the value stored in pos, and then repeat. Note that scanning a conversion specification is not trivial. Also, if there is an assignment-suppressing * in the specification, you'll have to expect a 0 back from sscanf() (and not provide the appropriate pointer from the variable args).

Try telling your compiler to compile your code as C99. If that still doesn't work, your libc does not comply with the C99 standard – in that case, get a proper libc.
E.g. if you're using gcc, try adding -std=c99 to the compiler command line.

There's a slightly simpler way to do this using the preprocessor, but it's a little hacky. Take this as an example:
#define my_sscanf(buf, fmt, ...) { \
do_something(); \
sscanf((buf), (fmt), __VA_ARGS__); \
do_something_else(); }

Related

Is there any observable difference between printf(str) and fwrite(str, 1, strlen(str), stdout)?

When printing static, format-less strings, one of the common optimizations that C compilers perform is to transform calls like printf("foobar\n"); to the equivalent puts("foobar");. This is valid as long as the return value is not used (C specifies that printf returns the number of characters written on success, but that puts only returns a non-negative value on success). C compilers will also transform calls like fprintf(stdout, "foobar") to fwrite("foobar", 1, 6, stdout).
However, the printf to puts optimization only applies if the string ends with a newline, since puts automatically appends a newline. When it does not, I would expect that printf could be optimized to an equivalent fwrite, just like the fprintf case - but it seems like compilers don't do this. For example, the following code (Godbolt link):
#include <stdio.h>
int main() {
printf("test1\n");
printf("test2");
fprintf(stdout, "test3");
}
gets optimized to the following sequence of calls in assembly:
puts("test1");
printf("test2");
fwrite("test3", 1, 5, stdout);
My question is: why do compilers not optimize printf to fwrite or similar in the absence of a terminating newline? Is this simply a missed optimization, or is there a semantic difference between printf and fwrite when used with static, format-less strings? If relevant, I am looking for answers that would apply to C11 or any newer standard.
This is just a missed optimization—there's no reason why a compiler could not do the transform you imagine—but it's a motivated one. It is significantly easier for a compiler to convert a call to printf into a call to puts than a call to fputs or fwrite, because the latter two would require the compiler to supply stdout as an argument. stdout is a macro, and by the time the compiler gets around to doing library-call optimizations, macro definitions may no longer be available (even if the preprocessor is integrated) or it may not be possible to parse token sequences into AST fragments anymore.
In contrast, the compiler can easily turn fprintf into fputs, because it can use whatever was supplied as the FILE* argument to fprintf to call fputs with as well. But I would not be surprised by a compiler that could turn fprintf(stdout, "blah\n") into fputs("blah\n", stdout) but not into puts("blah") ... because it has no way of knowing that the first argument to this fprintf call is stdout. (Keep in mind that this optimization pass is working with the IR equivalent of &_iob[1] or some such.)

sscanf - varying number of format arguments?

In my program, I use sscanf to check whether a string is of a given format. To do so, I provide the number of arguments in the format string and check whether sscanf returns that same number when parsing the input.
As part of a primitive parser, I want to check whether a string matches one of many formats. The sscanf function is variadic, so how do I deal with the varying number of arguments I need to pass?
Currently, I just pass a very large number of arguments (e.g. 50) to the function, and just hope that the format strings don't contain more arguments.
Is there any better way to do this?
You really need something heavier than scanf. You have to tell scanf what format your input is in; it can't figure anything out on its own.
If you have access to POSIX, look at regex.h it's probably everything you need.
Otherwise, you're stuck rolling your own. lex and yacc are nice if the format is rather complex, but otherwise, either strtok or (getchar+switch) is probably the way to go.
Edit:
Since you can use POSIX, here's an simple example of how to extract data from a regex in c. (error checking excluded for brevity.)
char txt[] = "232343341235898dfsfgs/.f";
regex_t reg;
regmatch_t refs[MAX_REFS]; //as in, the maximum number of data you want to extract
regcomp(&reg, "3433\\([0-5]*\\).*", 0); //replace 0 with REG_EXTENDED if desired
regexec(&reg, txt, MAX_REFS, refs, 0);
regfree(&reg);
txt[refs[0].rm_eo+1] = '\0';
int n = atoi(txt+refs[0].rm_so);
printf("%d\n", n);
Prints
41235
You should probably use lex/yacc to build a proper parser. Alternatively, first tokenizing the string with strtok might simplify your problem. (Beware: It is really tricky to use strtok correctly -- read its documentation very carefully.)
I'm not sure it answers your question, but you use varargs in C to allow a variable number of arguments to a function.
void myscanf(const char *fmt, ...)
{
}
The unhelpful answer is "don't do that, write a parser properly, maybe using lex and/or yacc or bison".
The answer to the question you asked is "yes, you could do that". I don't believe there's any reason why there can't be more variadic parameters than the format requires, although to few would be a bad thing. I'm presuming that you have an array or list of possible formats and you're calling sscanf in a loop.
You can write a validation function using the variable length arguments using the macros available in stdarg.h.
For example,
int my_validation_func(const char *format, ...) {
va_list ap;
char *p, *sval;
int ival;
float fval;
va_start(ap, format);
for(p=format; *p ; p++) {
if (*p != '%') {
continue;
}
switch(*++p) {
case 'd':
ival = va_arg(ap, int);
break;
case 'f':
fval = va_arg(ap, float);
break;
case 's':
for (sval = va_arg(ap, char *); *sval; sval++);
break;
default:
break;
}
}
va_end(ap);
}
Hope this helps!
If you don't know when you're writing the code the number and type(s) of the arguments, sscanf() cannot safely do what you're trying to do.
Passing 50 arguments to sscanf() is ok (arguments not consumed by the format string are evaluated but otherwise ignored), but the arguments that correspond to the format string have to be of the expected type, after promotion; otherwise, the behavior is undefined. So if you want to detect whether a string can be scanned with either "%d" or "%f", you can't safely do it with a single sscanf() call. (It's likely you could get away with passing a void* that points to a sufficiently large buffer, but the behavior is still undefined.)
Another nasty problem with sscanf() is that it doesn't handle numeric overflow. This:
char *s = "9999999999999999999999999";
int n;
int result = sscanf(s, "%d", &n);
printf("result = %d, n = %d\n", result, n);
has undefined behavior (assuming 9999999999999999999999999 is too big to be stored in an int).
Something you might be able to do is find an open-source sscanf implementation and modify it so it just verifies the string against the format, without storing anything. (Dealing with the license for the implementation is left as an exercise.) This makes sense if you find sscanf-style format strings particularly convenient for your problem. Otherwise, regular expressions are probably the way to go (not in the C standard, but it's easy enough to find an implementation).

Understanding C built-in library function implementations

So I was going through K&R second edition doing the exercises. Feeling pretty confident after doing few exercises I thought I'd check the actual implementations of these functions. It was then my confidence fled the scene. I could not understand any of it.
For example I check the getchar():
Here is the prototype in libio/stdio.h
extern int getchar (void);
So I follow it through it and gets this:
__STDIO_INLINE int
getchar (void)
{
return _IO_getc (stdin);
}
Again I follow it to the libio/getc.c:
int
_IO_getc (fp)
FILE *fp;
{
int result;
CHECK_FILE (fp, EOF);
_IO_acquire_lock (fp);
result = _IO_getc_unlocked (fp);
_IO_release_lock (fp);
return result;
}
And I'm taken to another header file libio/libio.h, which is pretty cryptic:
#define _IO_getc_unlocked(_fp) \
(_IO_BE ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end, 0) \
? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++)
Which is where I finally ended my journey.
My question is pretty broad. What does all this mean? I could not for the life of me figure out anything logical out of it by looking at the code. Looks like a bunch of codes abstracted away layers after layer.
More importantly when does it really get the character from stdin
_IO_getc_unlocked is an inlinable macro. The idea is that you can get a character from the stream without having to call a function, making it hopefully fast enough to use in tight loops, etc.
Let's take it apart one layer at a time. First, what is _IO_BE?
/usr/include/libio.h:# define _IO_BE(expr, res) __builtin_expect ((expr), res)
_IO_BE is a hint to the compiler, that expr will usually evaluate to res. It's used to structure code flow to be faster when the expectation is true, but has no other semantic effect. So we can get rid of that, leaving us with:
#define _IO_getc_unlocked(_fp) \
( ( (_fp)->_IO_read_ptr >= (_fp)->_IO_read_end ) \
? __uflow(_fp) : *(unsigned char *)(_fp)->_IO_read_ptr++) )
Let's turn this into an inline function for clarity:
inline int _IO_getc_unlocked(FILE *fp) {
if (_fp->_IO_read_ptr >= _fp->_IO_read_end)
return __uflow(_fp);
else
return *(unsigned char *)(_fp->_IO_read_ptr++);
}
In short, we have a pointer into a buffer, and a pointer to the end of the buffer. We check if the pointer is outside the buffer; if not, we increment it and return whatever character was at the old value. Otherwise we call __uflow to refill the buffer and return the newly read character.
As such, this allows us to avoid the overhead of a function call until we actually need to do IO to refill the input buffer.
Keep in mind that standard library functions can be complicated like this; they can also use extensions to the C language (such as __builtin_expect) that are NOT standard and may NOT work on all compilers. They do this because they need to be fast, and because they can make assumptions about what compiler they're using. Generally speaking your own code should not use such extensions unless absolutely necessary, as it'll make porting to other platforms more difficult.
Going from pseudo-code to real code we can break it down:
if (there is a character in the buffer)
return (that character)
else
call a function to refill the buffer and return the first character
end
Let's use the ?: operator:
#define getc(f) (is_there_buffered_stuff(f) ? *pointer++ : refill())
A bit closer:
#define getc(f) (is_there_buffered_stuff(f) ? *f->pointer++ : refill(f))
Now we are almost there. To determine if there is something buffered already, it uses the file structure pointer and a read pointer within the buffer
_fp->_IO_read_ptr >= _fp->_IO_read_end ?
This actually tests the opposite condition to my pseudo-code, "is the buffer empty", and if so, it calls __uflow(_fp) // "underflow", otherwise, it just reaches directly into the buffer with a pointer, gets the character, and then increments the pointer:
? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++)
I can highly recommend The Standard C Library by P.J. Plauger. He provides background on the standard and provides an implementation of every function. The implementation is simpler than what you'll see in glibc or a modern C compiler, but does still make use of macros like the _IO_getc_unlocked() you posted.
The macro is going to pull a character from buffered data (which could be the ungetc buffer) or read it from the stream (which may read and buffer multiple bytes).
The reason there is a standard library is that you should not need to know the exact implantation details of these functions. The code that implements the library calls at some point has to use nonstandard system calls which have to deal with issues you may not be concerned with. If you are learning C make sure you can understand other C programs besides the stdlib once you get a little more advance look at the stdlib, but it still won't make alot of sense until you understand the system calls involved.
The definition of getchar() redefines the request as a specific request for a character from stdin.
The definition of _IO_getc() does a sanity check to make sure that the FILE* exists and is not an End-Of-File, then it locks the stream to prevent other threads from corrupting the call to _IO_getc_unlocked().
The macro definition of _IO_getc_unlocked() simply checks to see if the read pointer is at or past the end of file point, and either calls __uflow if it is, or returns the char at the read pointer if it is not.
This is standard stuff for all stdlib implementations. You are not supposed to ever look at it. In fact, many stdlib implementations will use assembly language for optimal processing, which is even more cryptic.

Cannot create a program which will invert string

I am using Linux.
I am trying to write a program in c that will print a string backward.
Here is my code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main (){
char string[100];
printf ("Enter string:\n");
gets (string);
int length = strlen (string)-1;
for (length = length; length>=0; length--){
puts (string[length]);
}
}
And here is the error:
a.c:10: warning: passing argument 1 of ‘puts’ makes pointer from integer without a cast
/usr/include/stdio.h:668: note: expected ‘const char *’ but argument is of type ‘char’
/tmp/cc5rpeG7.o: In function `main':
a.c:(.text+0x29): warning: the `gets' function is dangerous and should not be used.
What should I do?
Forget that the function gets() exists - it is lethal. Use fgets() instead (but note that it does not remove the newline at the end of the line).
You want to put a single character at a time: use putchar() to write it to stdout. Don't forget to add a newline to the output after the loop.
Also, for (length = length; length >= 0; length--) is not idiomatic C. Use one of:
for ( ; length >= 0; length--)
for (length = strlen(string) - 1; length >= 0; length--)
for (int length = strlen(string) - 1; length >= 0; length--)
The last alternative uses a feature added to C99 (which was available in C++ long before).
Also, we could debate whether length is the appropriate name for the variable. It would be better renamed as i or pos or something similar because, although it is initialized to the length of the input, it is actually used as an array index, not as the length of anything.
Subjective: Don't put a space between the name of a function and its parameter list. The founding fathers of C don't do that - neither should you.
Why is gets() lethal?
The first Internet worm - the Morris worm from 1988 - exploited the fingerd program that used gets() instead of fgets(). Since then, numerous programs have been crashed because they used gets() and not fgets() or another alternative.
The fundamental problem is that gets() does not know how much space is available to store the data it reads. This leads to 'buffer overflows', a term which can be searched for in your favourite search engine that will return an enormous number of entries.
If someone types 150 characters of input to the example program, then gets() will store 150 characters in the array which has length 100. This never leads to happiness - it usually leads to a core dump, but with carefully chosen inputs - often generated by a Perl or Python script - you can probably get the program to execute arbitrary other code. This really matters if the program will ever be run by a user with 'elevated privileges'.
Incidentally, gets() is likely to be removed from the Standard C library in the next release (C1x - see n1494 from WG14). It won't vanish from actual C libraries for a long time yet (20 years?), but it should be replaced with this implementation (or something similar):
#undef NDEBUG
#include <assert.h>
char *gets(char *buffer)
{
assert("Probability of using gets() safely" == 0);
}
One other minor detail, discussed in part under the comments to the main question.
The code shown is clearly for C99; the declaration of length part way through the function is invalid in C89. Given that, it is 'OK' for the main() function not to explicitly return a value, because the C99 standard follows the lead of the C++ standard and allows you to omit the return from main() and the effect is the same as return(0); or return 0; at the end.
As such, the program in this question cannot strictly be faulted for not having a return at the end. However, I regard that as one of the more peculiar standardizing decisions, and would much prefer it if the standards had left that provision out - or done something more radical like allowing the ubiquitous but erroneous void main() observing that when control returns from that, the result is that a success status is returned to the environment. It isn't worth fighting to get that aspect of the standard changed - sadly - but as a personal style decision, I don't take advantage of the licence granted to omit the final return from main(). If the code has to work with C89 compilers, it should have the explicit return 0; at the end (but then the declaration of length has to be fixed too).
You can also use recursion to do it. I think it looks nicer then when using a loop.
Just call the method with your string, and before printing the char in the method, call the method again with the same string, minus the first char.
This will print out you string in reversed order.
First:
NEVER NEVER NEVER NEVER NEVER use gets(); it will introduce a point of failure in your code. There's no way to tell gets() how big the target buffer is, so if you pass a buffer sized to hold 10 characters and there's 100 characters in the input stream, gets() will happily store those extra 90 characters in the memory beyond the end of your buffer, potentially clobbering something important. Buffer overruns are an easy malware exploit; the Morris worm specifically exploited a gets() call in sendmail.
Use fgets() instead; it allows you to specify the maximum number of characters to read from the input stream. However, unlike gets(), fgets() will save the terminating newline character to the buffer if there's room for it, so you have to account for that:
char string[100];
char *newline;
printf("Enter a string: ");
fflush(stdout);
fgets(string, sizeof string, stdin);
newline = strchr(buffer, '\n'); // search for the newline character
if (newline) // if it's present
*newline = 0; // set it to zero
Now that's out of the way...
Your error is coming from the fact that puts() expects an argument of type char *, but you're passing an argument of type char, hence the "pointer from integer without cast" message (char is an integral type). To write a single character to stdout, use putchar() or fputc().
You should use putchar instead of puts
So this loop:
for (length = length; length>=0; length--){
puts (string[length]);
}
Will be:
for (length = length; length>=0; length--){
putchar (string[length]);
}
putchar will take a single char as a parameter and print it to stdout, which is what you want. puts, on the other hand, will print the whole string to stdout. So when you pass a single char to a function that expects a whole string (char array, NULL terminated string), compiler gets confused.
Use putc or putchar, as puts is specified to take a char* and you are feeding it a char.

Resuming [vf]?nprintf after reaching the limit

I have an application which prints strings to a buffer using snprintf and vsnprintf. Currently, if it detects an overflow, it appends a > to the end of the string as a sign that the string was chopped and prints a warning to stderr. I'm trying to find a way to have it resume the string [from where it left off] in another buffer.
If this was using strncpy, it would be easy; I know how many bytes were written, and so I can start the next print from *(p+bytes_written); However, with printf, I have two problems; first, the formatting specifiers may take up more or less space in the final string as in the format string, and secondly, my valist may be partially parsed.
Does anyone have an easy-ish solution to this?
EDIT: I should probably clarify that I'm working on an embedded system with limited memory + no dynamic allocation [i.e., I don't want to use dynamic allocation]. I can print messages of 255 bytes, but no more, although I can print as many of those as I want. I don't, however, have the memory to allocate lots of memory on the stack, and my print function needs to be thread-safe, so I can't allocate just one global / static array.
I don't think you can do what you're looking for (other than by the straightforward way of reallocating the buffer to the necessary size and performing the entire operation again).
The reasons you listed are a couple contributors to this, but the real killer is that the formatter might have been in the middle of formatting an argument when it ran out of space, and there's no reasonable way to restart that.
For example, say there's 3 bytes left in the buffer, and the formatter starts working on a "%d" conversion for the value -1234567. It ll put "-1\0" into the buffer then do whatever else it needs to do to return the size of buffer you really need.
In addition to you being able to determine which specifier the formatter was working on, you'd need to be able to figure out that instead of passing in -1234567 on the second round you need to pass in 234567. I defy you to come up with a reasonable way to do that.
Now if there's a real reason you don't want to restart the operation from the top, you probably could wrap the snprintf()/vsnprintf() call with something that breaks down the format string, sending only a single conversion specifier at a time and concatenating that result to the output buffer. You'd have to come up with some way for the wrapper to keep some state across retries so it knows which conversion spec to pick up from.
So maybe it's doable in a sense, but it sure seems like it would be an awful lot of work to avoid the much simpler 'full retry' scheme. I could see maybe (maybe) trying this on a system where you don't have the luxury of dynamically allocating a larger buffer (an embedded system, maybe). In that case, I'd probably argue that what's needed is a much simpler/restricted scope formatter that doesn't have all the flexibility of printf() formatters and can handle retrying (because their scope is more limited).
But, man, I would try very hard to talk some sense into whoever said it was a requirement.
Edit:
Actually, I take some of that back. If you're willing to use a customized version of snprintf() (let's call it snprintf_ex()) I could see this being a relatively simple operation:
int snprintf_ex( char* s, size_t n, size_t skipChars, const char* fmt, ...);
snprintf_ex() (and its companion functions such as vsnprintf()) will format the string into the provided buffer (as usual) but will skip outputting the first skipChars characters.
You could probably rig this up pretty easy using the source from your compiler's library (or using something like Holger Weiss' snprintf()) as a starting point. Using this might look something like:
int bufSize = sizeof(buf);
char* fmt = "some complex format string...";
int needed = snprintf_ex( buf, bufSize, 0, fmt, arg1, arg2, etc, etc2);
if (needed >= bufSize) {
// dang truncation...
// do whatever you want with the truncated bits (send to a logger or whatever)
// format the rest of the string, skipping the bits we already got
needed = snprintf_ex( buf, bufSize, bufSize - 1, fmt, arg1, arg2, etc, etc2);
// now the buffer contains the part that was truncated before. Note that
// you'd still need to deal with the possibility that this is truncated yet
// again - that's an exercise for the reader, and it's probably trickier to
// deal with properly than it might sound...
}
One drawback (that might or might not be acceptable) is that the formatter will do all the formatting work over again from the start - it'll just throw away the first skipChars characters that it comes up with. If I had to use something like this, I'd think that would almost certainly be an acceptable thing (it what happens when someone deals with truncation using the standard snprintf() family of functions).
The C99 functions snprintf() and vsnprintf() both return the number of characters needed to print the whole format string with all the arguments.
If your implementation conforms to C99, you can create an array large enough for your output strings then deal with them as needed.
int chars_needed = snprintf(NULL, 0, fmt_string, v1, v2, v3, ...);
char *buf = malloc(chars_needed + 1);
if (buf) {
snprintf(buf, chars_needed + 1, fmt_string, v1, v2, v3, ...);
/* use buf */
free(buf);
} else {
/* no memory */
}
If you're on a POSIX-ish system (which I'm guessing you may be since you mentioned threads), one nice solution would be:
First try printing the string to a single buffer with snprintf. If it doesn't overflow, you've saved yourself a lot of work.
If that doesn't work, create a new thread and a pipe (with the pipe() function), fdopen the writing end of the pipe, and use vfprintf to write the string. Have the new thread read from the reading end of the pipe and break the output string into 255-byte messages. Close the pipe and join with the thread after vfprintf returns.

Resources