Resuming [vf]?nprintf after reaching the limit - c

I have an application which prints strings to a buffer using snprintf and vsnprintf. Currently, if it detects an overflow, it appends a > to the end of the string as a sign that the string was chopped and prints a warning to stderr. I'm trying to find a way to have it resume the string [from where it left off] in another buffer.
If this was using strncpy, it would be easy; I know how many bytes were written, and so I can start the next print from *(p+bytes_written); However, with printf, I have two problems; first, the formatting specifiers may take up more or less space in the final string as in the format string, and secondly, my valist may be partially parsed.
Does anyone have an easy-ish solution to this?
EDIT: I should probably clarify that I'm working on an embedded system with limited memory + no dynamic allocation [i.e., I don't want to use dynamic allocation]. I can print messages of 255 bytes, but no more, although I can print as many of those as I want. I don't, however, have the memory to allocate lots of memory on the stack, and my print function needs to be thread-safe, so I can't allocate just one global / static array.

I don't think you can do what you're looking for (other than by the straightforward way of reallocating the buffer to the necessary size and performing the entire operation again).
The reasons you listed are a couple contributors to this, but the real killer is that the formatter might have been in the middle of formatting an argument when it ran out of space, and there's no reasonable way to restart that.
For example, say there's 3 bytes left in the buffer, and the formatter starts working on a "%d" conversion for the value -1234567. It ll put "-1\0" into the buffer then do whatever else it needs to do to return the size of buffer you really need.
In addition to you being able to determine which specifier the formatter was working on, you'd need to be able to figure out that instead of passing in -1234567 on the second round you need to pass in 234567. I defy you to come up with a reasonable way to do that.
Now if there's a real reason you don't want to restart the operation from the top, you probably could wrap the snprintf()/vsnprintf() call with something that breaks down the format string, sending only a single conversion specifier at a time and concatenating that result to the output buffer. You'd have to come up with some way for the wrapper to keep some state across retries so it knows which conversion spec to pick up from.
So maybe it's doable in a sense, but it sure seems like it would be an awful lot of work to avoid the much simpler 'full retry' scheme. I could see maybe (maybe) trying this on a system where you don't have the luxury of dynamically allocating a larger buffer (an embedded system, maybe). In that case, I'd probably argue that what's needed is a much simpler/restricted scope formatter that doesn't have all the flexibility of printf() formatters and can handle retrying (because their scope is more limited).
But, man, I would try very hard to talk some sense into whoever said it was a requirement.
Edit:
Actually, I take some of that back. If you're willing to use a customized version of snprintf() (let's call it snprintf_ex()) I could see this being a relatively simple operation:
int snprintf_ex( char* s, size_t n, size_t skipChars, const char* fmt, ...);
snprintf_ex() (and its companion functions such as vsnprintf()) will format the string into the provided buffer (as usual) but will skip outputting the first skipChars characters.
You could probably rig this up pretty easy using the source from your compiler's library (or using something like Holger Weiss' snprintf()) as a starting point. Using this might look something like:
int bufSize = sizeof(buf);
char* fmt = "some complex format string...";
int needed = snprintf_ex( buf, bufSize, 0, fmt, arg1, arg2, etc, etc2);
if (needed >= bufSize) {
// dang truncation...
// do whatever you want with the truncated bits (send to a logger or whatever)
// format the rest of the string, skipping the bits we already got
needed = snprintf_ex( buf, bufSize, bufSize - 1, fmt, arg1, arg2, etc, etc2);
// now the buffer contains the part that was truncated before. Note that
// you'd still need to deal with the possibility that this is truncated yet
// again - that's an exercise for the reader, and it's probably trickier to
// deal with properly than it might sound...
}
One drawback (that might or might not be acceptable) is that the formatter will do all the formatting work over again from the start - it'll just throw away the first skipChars characters that it comes up with. If I had to use something like this, I'd think that would almost certainly be an acceptable thing (it what happens when someone deals with truncation using the standard snprintf() family of functions).

The C99 functions snprintf() and vsnprintf() both return the number of characters needed to print the whole format string with all the arguments.
If your implementation conforms to C99, you can create an array large enough for your output strings then deal with them as needed.
int chars_needed = snprintf(NULL, 0, fmt_string, v1, v2, v3, ...);
char *buf = malloc(chars_needed + 1);
if (buf) {
snprintf(buf, chars_needed + 1, fmt_string, v1, v2, v3, ...);
/* use buf */
free(buf);
} else {
/* no memory */
}

If you're on a POSIX-ish system (which I'm guessing you may be since you mentioned threads), one nice solution would be:
First try printing the string to a single buffer with snprintf. If it doesn't overflow, you've saved yourself a lot of work.
If that doesn't work, create a new thread and a pipe (with the pipe() function), fdopen the writing end of the pipe, and use vfprintf to write the string. Have the new thread read from the reading end of the pipe and break the output string into 255-byte messages. Close the pipe and join with the thread after vfprintf returns.

Related

May printf (or fprintf or dprintf) return ("successfully") less (but nonnegative) than the number of "all bytes"?

The manual says that
Upon successful return, these functions [printf, dprintf etc.] return the number of characters printed.
The manual does not mention whethet may this number less (but yet nonnegative) than the length of the "final" (substitutions and formattings done) string. Nor mentions that how to check whether (or achieve that) the string was completely written.
The dprintf function operates on file descriptor. Similarily to the write function, for which the manual does mention that
On success, the number of bytes written is returned (zero indicates nothing was written). It is not an error if this number is smaller than the number of bytes requested;
So if I want to write a string completely then I have to enclose the n = write() in a while-loop. Should I have to do the same in case of dprintf or printf?
My understanding of the documentation is that dprintf would either fail or output all the output. But I agree that it is some gray area (and I might not understand well); I'm guessing that a partial output is some kind of failure (so returns a negative size).
Here is the implementation of musl-libc:
In stdio/dprintf.c the dprintf function just calls vdprintf
But in stdio/vdprintf.c you just have:
static size_t wrap_write(FILE *f, const unsigned char *buf, size_t len)
{
return __stdio_write(f, buf, len);
}
int vdprintf(int fd, const char *restrict fmt, va_list ap)
{
FILE f = {
.fd = fd, .lbf = EOF, .write = wrap_write,
.buf = (void *)fmt, .buf_size = 0,
.lock = -1
};
return vfprintf(&f, fmt, ap);
}
So dprintf is returning a size like vfprintf (and fprintf....) does.
However, if you really are concerned, you'll better use snprintf or asprintf to output into some memory buffer, and explicitly use write(2) on that buffer.
Look into stdio/__stdio_write.c the implementation of __stdio_write (it uses writev(2) with a vector of two data chunks in a loop).
In other words, I would often not really care; but if you really need to be sure that every byte has been written as you expect it (for example if the file descriptor is some HTTP socket), I would suggest to buffer explicitly (e.g. by calling snprintf and/or asprintf) yourself, then use your explicit write(2).
PS. You might check yourself the source code of your particular C standard library providing dprintf; for GNU glibc see notably libio/iovdprintf.c
With stdio, returning the number of partially written bytes doesn't make much sense because stdio functions work with a (more or less) global buffer whose state is unknown to you and gets dragged in from previous calls.
If stdio functions allowed you to work with that, the error return values would need to be more complex as they would not only need to communicate how many characters were or were not outputted, but also whether the failure was before your last input somewhere in the buffer, or in the middle of your last input and if so, how much of the last input got buffered.
The d-functions could theoretically give you the number of partially written characters easy, but POSIX specifies that they should mirror the stdio functions and so they only give you a further unspecified negative value on error.
If you need more control, you can use the lower level functions.
Concerning printf(), it is quite clear.
The printf function returns the number of characters transmitted, or a negative value if an output or encoding error occurred. C11dr ยง7.21.6.3 3
A negative value is returned if an error occurred. In that case 0 or more characters may have printed. The count is unknowable via the standard library.
If the value return is not negative, that is the number sent to stdout.
Since stdout is often buffered, that may not be the number received at the output device on the conclusion of printf(). Follow printf() with a fflush(stdout)
int r1 = printf(....);
int r2 = fflush(stdout);
if (r1 < 0 || r2 != 0) Handle_Failure();
For finest control, "print" to a buffer and use putchar() or various non-standard functions.
My bet is that no. (After looking into the - obfuscated - source of printf.) So any nonnegative return value means that printf was fully succesful (reached the end of the format string, everything was passed to kernel buffers).
But some (authentic) people should confirm it.

Why would one add 1 or 2 to the second argument of snprintf?

What is the role of 1 and 2 in these snprintf functions? Could anyone please explain it
snprintf(argv[arg++], strlen(pbase) + 2 + strlen("ivlpp"), "%s%ccivlpp", pbase, sep);
snprintf(argv[arg++], strlen(defines_path) + 1, "-F\"%s\"", defines_path);
The role of the +2 is to allow for a terminal null and the embedded character from the %c format, so there is exactly the right amount of space for formatting the first string. but (as 6502 points out), the actual string provided is one space shorter than needed because the strlen("ivlpp") doesn't match the civlpp in the format itself. This means that the last character (the second 'p') will be truncated in the output.
The role of the +1 is also to cause snprintf() to truncate the formatted data. The format string contains 4 literal characters, and you need to allow for the terminal null, so the code should allocate strlen(defines)+5. As it is, the snprintf() truncates the data, leaving off 4 characters.
I'm dubious about whether the code really works reliably...the memory allocation is not shown, but will have to be quite complex - or it will have to over-allocate to ensure that there is no danger of buffer overflow.
Since a comment from the OP says:
I don't know the use of snprintf()
int snprintf(char *restrict s, size_t n, const char *restrict format, ...);
The snprintf() function formats data like printf(), but it writes it to a string (the s in the name) instead of to a file. The first n in the name indicates that the function is told exactly how long the string is, and snprintf() therefore ensures that the output data is null terminated (unless the length is 0). It reports how long the string should have been; if the reported value is longer than the value provided, you know the data got truncated.
So, overall, snprintf() is a relatively safe way of formatting strings, provided you use it correctly. The examples in the question do not demonstrate 'using it correctly'.
One gotcha: if you work on MS Windows, be aware that the MSVC implementation of snprintf() does not exactly follow the C99 standard (and it looks a bit as though MS no longer provides snprintf() at all; only various alternatives such as _snprintf()). I forget the exact deviation, but I think it means that the string is not properly null-terminated in all circumstances when it should be longer than the space provided.
With locally defined arrays, you normally use:
nbytes = snprintf(buffer, sizeof(buffer), "format...", ...);
With dynamically allocated memory, you normally use:
nbytes = snprintf(dynbuffer, dynbuffsize, "format...", ...);
In both cases, you check whether nbytes contains a non-negative value less than the size argument; if it does, your data is OK; if the value is equal to or larger, then your data got chopped (and you know how much space you needed to allocate).
The C99 standard says:
The snprintf function returns the number of characters that would have been written
had n been sufficiently large, not counting the terminating null character, or a negative
value if an encoding error occurred. Thus, the null-terminated output has been
completely written if and only if the returned value is nonnegative and less than n.
The programmer whose code you are reading doesn't know how to use snprintf properly. The second argument is the buffer size, so it should almost always look like this:
snprintf(buf, sizeof buf, "..." ...);
The above is for situations where buf is an array, not a pointer. In the latter case you have to pass the buffer size along:
snprintf(buf, bufsize, "...", ...);
Computing the buffer size is unneeded.
By the way, since you tagged the question as qt-related. There is a very nice QString class that you should use instead.
At a first look both seem incorrect.
In the first case the correct computation would be path + sep + name + NUL so 2 would seem ok, but for the name the strlen call is using ilvpp while the formatting code is using instead cilvpp that is one char longer.
In the second case the number of chars added is 4 (-L"") so the number to add should be 5 because of the ending NUL.

K&R 3-2 overflow problem

I'm going through K&R and 3-2 looks like it would be easy to get into a buffer overflow
Write a function escape(s,t) that converts characters like newline and tab into visible escape sequences like \n and \t as it copies the string t to s. Use a switch
If I replace the byte '\n' with '\' and 'n', the size of s could potentially be quite a bit bigger than the source string.
I could just write this program and ignore the overflow but I would rather not.
I'm having issue wrapping my head around how to handle this?
I'm thinking having a fixed buffer size, perhaps something out of limit.h, and flushing the buffer to stdio when it gets full?
I believe the entire point of the exercise is to teach you that when you're dealing with something like this you either need to:
Shoot too high (make a buffer double the size of the original)
Take extra time (an extra pass) and pre-compute the required size of the buffer.
s will never be longer than twice the length of t. Since this is an exercise apparently meant to help you learn to use switch, I think it'd be fine to assume that the caller passes a string in s that's of sufficient length. Or, if s is of type char** (or similar), then you're meant to allocate the string, in which case you can allocate a string of the proper size.
In a real-world function, you'd probably have another parameter that indicates the maximum length for the destination string.
try adding a size parameter, so you know the size of the target buffer. If you pass a pointer to that parameter, you can return some kind of error value if the buffer is too small and pass the needed size back through the size param. Something like:
int escape(size_t *size, char *out, const char *in);

Different ways to calculate string length

A comment on one of my answers has left me a little puzzled. When trying to compute how much memory is needed to concat two strings to a new block of memory, it was said that using snprintf was preferred over strlen, as shown below:
size_t length = snprintf(0, 0, "%s%s", str1, str2);
// preferred over:
size_t length = strlen(str1) + strlen(str2);
Can I get some reasoning behind this? What is the advantage, if any, and would one ever see one result differ from the other?
I was the one who said it, and I left out the +1 in my comment which was written quickly and carelessly, so let me explain. My point was merely that you should use the pattern of using the same method to compute the length that will eventually be used to fill the string, rather than using two different methods that could potentially differ in subtle ways.
For example, if you had three strings rather than two, and two or more of them overlapped, it would be possible that strlen(str1)+strlen(str2)+strlen(str3)+1 exceeds SIZE_MAX and wraps past zero, resulting in under-allocation and truncation of the output (if snprintf is used) or extremely dangerous memory corruption (if strcpy and strcat are used).
snprintf will return -1 with errno=EOVERFLOW when the resulting string would be longer than INT_MAX, so you're protected. You do need to check the return value before using it though, and add one for the null terminator.
If you only need to determine how big would be the concatenation of the two strings, I don't see any particular reason to prefer snprintf, since the minimum operations to determine the total length of the two strings is what the two strlen calls do. snprintf will almost surely be slower, because it has to check the parameters and parse the format string besides just walking the two strings counting the characters.
... but... it may be an intelligent move to use snprintf if you are in a scenario where you want to concatenate two strings, and have a static, not too big buffer to handle normal cases, but you can fallback to a dynamically allocated buffer in case of big strings, e.g.:
/* static buffer "big enough" for most cases */
char buffer[256];
/* pointer used in the part where work on the string is actually done */
char * outputStr=buffer;
/* try to concatenate, get the length of the resulting string */
int length = snprintf(buffer, sizeof(buffer), "%s%s", str1, str2);
if(length<0)
{
/* error, panic and death */
}
else if(length>sizeof(buffer)-1)
{
/* buffer wasn't enough, allocate dynamically */
outputStr=malloc(length+1);
if(outputStr==NULL)
{
/* allocation error, death and panic */
}
if(snprintf(outputStr, length, "%s%s", str1, str2)<0)
{
/* error, the world is doomed */
}
}
/* here do whatever you want with outputStr */
if(outputStr!=buffer)
free(outputStr);
One advantage would be that the input strings are only scanned once (inside the snprintf()) instead of twice for the strlen/strcpy solution.
Actually, on rereading this question and the comment on your previous answer, I don't see what the point is in using sprintf() just to calculate the concatenated string length. If you're actually doing the concatenation, my above paragraph applies.
You need to add 1 to the strlen() example. Remember you need to allocate space for nul terminating byte.
So snprintf( ) gives me the size a string would have been. That means I can malloc( ) space for that guy. Hugely useful.
I wanted (but did not find until now) this function of snprintf( ) because I format tons of strings for output later; but I wanted not to have to assign static bufs for the outputs because it's hard to predict how long the outputs will be. So I ended up with a lot of 4096-long char arrays :-(
But now -- using this newly-discovered (to me) snprintf( ) char-counting function, I can malloc( ) output bufs AND sleep at night, both.
Thanks again and apologies to the OP and to Matteo.
EDIT: random, mistaken nonsense removed. Did I say that?
EDIT: Matteo in his comment below is absolutely right and I was absolutely wrong.
From C99:
2 The snprintf function is equivalent to fprintf, except that the output is written into
an array (specified by argument s) rather than to a stream. If n is zero, nothing is written,
and s may be a null pointer. Otherwise, output characters beyond the n-1st are
discarded rather than being written to the array, and a null character is written at the end
of the characters actually written into the array. If copying takes place between objects
that overlap, the behavior is undefined.
Returns
3 The snprintf function returns the number of characters that would have been written
had n been sufficiently large, not counting the terminating null character, or a neg ative
value if an encoding error occurred. Thus, the null-terminated output has been
completely written if and only if the returned value is nonnegative and less than n.
Thank you, Matteo, and I apologize to the OP.
This is great news because it gives a positive answer to a question I'd asked here only a three weeks ago. I can't explain why I didn't read all of the answers, which gave me what I wanted. Awesome!
The "advantage" that I can see here is that strlen(NULL) might cause a segmentation fault, while (at least glibc's) snprintf() handles NULL parameters without failing.
Hence, with glibc-snprintf() you don't need to check whether one of the strings is NULL, although length might be slightly larger than needed, because (at least on my system) printf("%s", NULL); prints "(null)" instead of nothing.
I wouldn't recommend using snprintf() instead of strlen() though. It's just not obvious. A much better solution is a wrapper for strlen() which returns 0 when the argument is NULL:
size_t my_strlen(const char *str)
{
return str ? strlen(str) : 0;
}

Help gcc to not warn about not using a string literal format string

I'm creating a function in C to convert an index value into a string, which is a verbose description of the "field" represented by the index.
So, I have a nice array with all the verbose descriptions indexed by, well the index.
To dump it into a buffer I use code like this
#define BUFFER_SIZE 40
void format_verbose(uint32_t my_index,
char my_buffer[BUFFER_SIZE])
{
snprintf(mY_buffer, BUFFER_SIZE, "%s", MY_ARRAY[my_index].description);
}
The problem comes for some cases I need to insert some other strings into the string when formatting it. So what I want is something like this (where the description in this case contains a %s).
void format_verbose_with_data(uint32_t my_index,
char my_buffer[BUFFER_SIZE])
{
// ...
snprintf(mY_buffer, BUFFER_SIZE, MY_ARRAY[my_index].description,
some_string);
}
Our make file is set up to make this (dangerous) use of snprintf() warn, and warnings are treated as errors. So, it won't compile. I would like to turn off the warning for just this line, where although it is somewhat dangerous, I will control the string, and I can test to ensure it works with every value it's called with.
Alternatively, I would be happy to do this some other way, but I'm really not keen to use this solution
void format_verbose_with_data(uint32_t my_index,
char my_buffer[BUFFER_SIZE])
{
// ...
snprintf(mY_buffer, BUFFER_SIZE, "%s%s%s"
MY_ARRAY[my_index].description1, some_string,
MY_ARRAY[my_index].description2);
}
Because it makes my description array ugly, especially for the ones where I don't need to add extra values.
GCC doesn't have the ability to turn off warnings on a line by line basis, so I suspect you are out of luck. And anyway, if your coding standards say you shouldn't be doing something, you should not be looking for ways to defeat them.
Another point, when you say:
void format_verbose(uint32_t my_index,
char my_buffer[BUFFER_SIZE])
you are really wasting your time typing - it is clearer and more idiomatic to say:
void format_verbose(uint32_t my_index,
char my_buffer[])
or:
void format_verbose(uint32_t my_index,
char * my_buffer)
If all you ever do with snprintf() is copying strings (as seems to be the case), i.e. all you ever have is one or more "%s" as format string, why are you not using strcpy(), perhaps with a strlen() first to check the source's length?
Note that strncpy(), while looking like a good idea, isn't. It always pads the target buffer with zeroes, and in case the source exceeds the buffer size, doesn't null-terminate the string.
After a night of thought, I plan on manually dividing the input string up, by looking for the %s marker myself, and then sending the strings separetley into snprintf() with their own %s. For this case, where only 1 type of format string is allowed, this less onerous, however it would suck to try to completely re-implement a printf() style parser.
void format_verbose_with_data(uint32_t my_index,
char my_buffer[BUFFER_SIZE])
{
char pre_description[BUFFER_SIZE];
char post_description[BUFFER_SIZE];
int32_t offset = -1;
offset = find_string(MY_ARRAY[my_index].description, "%s");
ASSERT(offset >=0, "No split location!");
// Use offset to copy the pre and post descriptions
// Exercise left to the reader :-)
snprintf(mY_buffer, BUFFER_SIZE, "%s%s%s"
pre_description, some_string, post_description);
}

Resources