Format strings safely when vsnprintf is not available - c

I am writing code that needs to format a string, and I want to avoid buffer overruns.
I know that if vsnprintf is available (C99 onwards) we can do:
char* formatString(const char *format, ...)
{
char* result = NULL;
va_list ap;
va_start(ap, format);
/* Get the size of the formatted string by getting vsnprintf return the
* number of remaining characters if we ask it to write 0 characters */
int size = vsnprintf(NULL, 0, format, ap);
if (size > 0)
{
/* String formatted just fine */
result = (char *) calloc(size + 1, sizeof(char));
vsnprintf(result, size + 1, format, ap);
}
va_end(ap);
return result;
}
I can't figure out a way of doing something similar in C90 (without vsnprintf). If it turns out to not be possible without writing extremely complex logic I'd be happy to set a maximum length for the result, but I'm not sure how that could be achieved either without risking a buffer overrun.

Pre-C99 affords no simply solution to format strings with a high degree of safety of preventing buffer overruns.
It is those pesky "%s", "%[]", "%f" format specifiers that require so much careful consideration with their potential long output. Thus the need for such a function. #Jonathan Leffler
To do so with those early compilers obliges code to analyze format and the arguments to find the required size. At that point, code is nearly there to making you own complete my_vsnprintf(). I'd seek existing solutions for that. #user694733.
Even with C99, there are environmental limits for *printf().
The number of characters that can be produced by any single conversion shall be at least 4095. C11dr §7.21.6.1 15
So any code that tries to char buf[10000]; snprintf(buf, sizeof buf, "%s", long_string); risks problems even with a sufficient buf[] yet with strlen(long_string) > 4095.
This implies that a quick and dirty code could count the % and the format length and make the reasonable assumption that the size needed does not exceed:
size_t sz = 4095*percent_count + strlen(format) + 1;
Of course further analysis of the specifiers could lead to a more conservative sz. Continuing down this path we end at writing our own my_vsnprintf().
Even with your own my_vsnprintf() the safety is only so good. There is no run-time check that the format (which may be dynamic) matches the following arguments. To do so requires a new approach.
Cheeky self advertisement for a C99 solution to insure matching specifiers and arguments: Formatted print without the need to specify type matching specifiers using _Generic.

Transferring comments to answer.
The main reason vsnprintf() was added to C99 was that it is hard to protect vsprintf() or similar. One workaround is to open /dev/null, use vfprintf() to format the data to it, note how big a result was needed, and then decide whether it is safe to proceed. Icky, especially if you open the device on each call.
That means your code might become:
#include <assert.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
extern char *formatString(const char *format, ...);
char *formatString(const char *format, ...)
{
static FILE *fp_null = NULL;
if (fp_null == NULL)
{
fp_null = fopen("/dev/null", "w");
if (fp_null == NULL)
return NULL;
}
va_list ap;
va_start(ap, format);
int size = vfprintf(fp_null, format, ap);
va_end(ap);
if (size < 0)
return NULL;
char *result = (char *) malloc(size + 1);
if (result == NULL)
return NULL;
va_start(ap, format);
int check = vsprintf(result, format, ap);
va_end(ap);
assert(check == size);
return result;
}
int main(void)
{
char *r1 = formatString("%d Dancing Pigs = %4.2f%% of annual GDP (grandiose dancing pigs!)\n",
34241562, 21.2963);
char *r2 = formatString("%s [%-13.10s] %s is %d%% %s\n", "Peripheral",
"sub-atomic hyperdrive", "status", 99, "of normality");
if (r1 != NULL)
printf("r1 = %s", r1);
if (r2 != NULL)
printf("r2 = %s", r2);
free(r1);
free(r2);
return 0;
}
As written with fp_null a static variable inside the function, the file stream cannot be closed. If that's a bother, make it a variable inside the file and provide a function to if (fp_null != NULL) { fclose(fp_null); fp_null = NULL; }.
I'm unapologetically assuming a Unix-like environment with /dev/null; you can translate that to NUL: if you're working on Windows.
Note that the original code in the question did not use va_start() and va_end() twice (unlike this code); that would lead to disaster. In my opinion, it is a good idea to put the va_end() as soon after the va_start() as possible — as shown in this code. Clearly, if your function is itself stepping through the va_list, then there will be a bigger gap than shown here, but when you're simply relaying the variable arguments to another function as here, there should be just the one line in between.
The code compiles cleanly on a Mac running macOS 10.14 Mojave using GCC 8.2.0 (compiled on macOS 10.13 High Sierra) with the command line:
$ gcc -O3 -g -std=c90 -Wall -Wextra -Werror -Wmissing-prototypes \
> -Wstrict-prototypes vsnp37.c -o vsnp37
$
When run, it produces:
r1 = 34241562 Dancing Pigs = 21.30% of annual GDP (grandiose dancing pigs!)
r2 = Peripheral [sub-atomic ] status is 99% of normality

Related

Is there any gcc compiler option or something similar to bypass the obsolesence of the gets function?

Let's get this out of the way first: I am fully aware of why the gets function is deprecated and how it can be used maliciously in uncontrolled environments. I am positive the function can be used safely in my setup.
"Alternative" functions won't do for my use case. The software I am writing is very performance-sensitive, so using fgets and then using one full string loop (either explicitely or by a helper function like strlen) to get rid of the newline at the end of the string simply won't do. Writing my own gets function using getchar or something similar will also probably be much less efficient than a native implementation (not to mention very error prone).
The Visual Studio compiler has a gets_s function which fits my needs, however this function is nowhere to be found in gcc. Is there any compiler option/flag to get this function back and/or some alternative function that it implements?
Thank you in advance.
Implementing your own safe gets() function using getchar_unlocked() is easy and reasonably efficient.
If your application is so performance sensitive, that you think fgets() and removing the scan is going to be the bottleneck, you should probably not use the stream functions at all and use lower level read() system calls or memory mapped files.
In any case, you should carefully benchmark your application and use profiling tools to determine where the time is spent.
Here is a simple implementation that returns the line length but truncates the line to whatever fits in the destination array buf of length n and returns EOF at end of file:
int my_gets(char *buf, size_t n) {
int c;
size_t i = 0;
while ((c = getchar_unlocked()) != EOF && c != '\n') {
if (i < n) {
buf[i] = c;
}
i++;
}
if (i < n) {
buf[i] = '\0';
} else
if (n > 0) {
buf[n - 1] = '\0';
}
if (c == EOF && i == 0) {
return EOF;
} else {
return (int)i;
}
}
If your goal is to parse a log file line by line and only this function to read from stdin, you can implement a custom buffering scheme with read or fread in a custom version of gets(). This would be portable and fast but not thread safe nor elegant.
Here is an example that is 20% faster than fgets() on my system:
/* read a line from stdin
strip extra characters and the newline
return the number of characters before the newline, possibly >= n
return EOF at end of file
*/
static char gets_buffer[65536];
static size_t gets_pos, gets_end;
int my_fast_gets(char *buf, size_t n) {
size_t pos = 0;
for (;;) {
char *p = gets_buffer + gets_pos;
size_t len = gets_end - gets_pos;
char *q = memchr(p, '\n', len);
if (q != NULL) {
len = q - p;
}
if (pos + len < n) {
memcpy(buf + pos, p, len);
buf[pos + len] = '\0';
} else
if (pos < n) {
memcpy(buf + pos, p, n - pos - 1);
buf[n - 1] = '\0';
}
pos += len;
gets_pos += len;
if (q != NULL) {
gets_pos += 1;
return (int)pos;
}
gets_pos = 0;
gets_end = fread(gets_buffer, 1, sizeof gets_buffer, stdin);
if (gets_end == 0) {
return pos == 0 ? EOF : (int)pos;
}
}
}
I suppose you are running on Windows, so it's possible that none of this information is relevant. The tests below were done on a Linux Ubuntu laptop, not particularly leading edge. But, for what it's worth:
If gets is in your standard library (it's in my standard library, fwiw), then you only need to declare it to use it. It doesn't matter that it has been removed from your standard library headers:
char* gets(char* buf);
You could declare gets_s yourself, too, if that's the one you want to use.
This is entirely legal according to the C standard: "Provided that a library function can be declared without reference to any type defined in a header, it is also permissible to declare the function and use it without including its associated header." (§7.1.4 ¶2)
See #dbush's answer for the linker option to avoid the deprecation message.
That's not actually what I would do. I'd use the Posix-standard getline function. On a standard Linux install, you need a feature-test macro to see the declaration: #define _POSIX_C_SOURCE 200809L or #define _XOPEN_SOURCE 700. (You can use larger numbers if you want to.) getline avoids many of the issues with fgets (and, of course, gets) because it returns the length of the line read rather than copying its buffer argument to the return value. If your input handler needs (or can use) this information, it might save a few cycles to have it available. It certainly can be used to check for and remove the newline.
On my little laptop, using a file of 100,000,000 words as you suggest, I got the following timings for the three alternatives I tested:
gets (dangerous) fgets (+strlen) getline
---------------- --------------- -------
1.9958 2.3585 2.0350
So it does show some overhead with respect to gets, but it's not (IMHO) very significant, and it's quite possible that the fact that you don't need strlen in your handler will recoup the small additional overhead.
Here are the loops I tested. (I call an external function called handle to avoid optimisation issues; in my test, all handle does is increment a global line count.)
gets (dangerous)
char buf[80]; // Disaster in waiting
for (;;) {
char* in = gets(buf);
if (in == NULL) break;
handle(in);
}
fgets (+strlen)
char buf[80]; // Safe. But long lines are not handled correctly.
for (;;) {
char* in = fgets(buf, sizeof buf, stdin);
if (in == NULL) break;
size_t inlen = strlen(in);
if (inlen && in[inlen - 1] == '\n')
in[inlen - 1] = 0;
handle(in);
}
getline
size_t buflen = 80; // If I guessed wrong, the only cost is an extra malloc.
char* in = malloc(buflen); // TODO: check for NULL return.
for (;;) {
ssize_t inlen = getline(&in, &buflen, stdin);
if (inlen == -1) break;
if (inlen && in[inlen - 1] == '\n')
in[inlen - 1] = 0;
handle(in);
}
free(in);
You can use the -Wno-deprecated-declarations option to prevent warnings for all deprecated functions. If you want to to disable it in specific instances, you can use pragmas for this:
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
gets(str);
#pragma GCC diagnostic pop
Note that in both cases this only prevents the compiler from complaining. The linker will still give a warning.

Polish characters in C using files and lists

I need to get proper Polish characters "ąężźćśół". I used some solutions like setlocale, system chcp, wchar_t. Everything goes well as long as I don't use files/lists. wscanf, wprintf and wchar_t works perfectly.
But if I'm trying to read something from a file and save that into a list (even in array), then trying to put that to the screen, I can't get proper Polish characters, and in case of the lists, I'm getting different results from time to time for example, z` , A2 , like random characters from nowhere. I've been trying to get good results by using fscanf and fgets with w(wide) variations, but it doesn't work. Did I something wrong?
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>
struct dyk{
wchar_t line[200];
struct dyk *next;
};
typedef struct dyk dyk;
void printdyk(char name[100]){
dyk *wyp;
wyp = malloc(sizeof(dyk));
wchar_t yt[100];
FILE *dyktando;
dyktando = fopen(name, "r+");
if(dyktando == NULL){
wprintf(L"Błąd otwarcia pliku!\n"); //Can't open file
}else{
fgets(&wyp->line, sizeof(dyk), dyktando); //reading from file and send to the list
wprintf(L"%s\n", wyp->line); //write text from the list on the screen
wchar_t yt[100];
wscanf(L"%s", &yt); //testing strings comparing, so I have to put some variables
int n=strcmp(yt, wyp->line); //str compare
printf("%d", n); //result, it gives me -1 every time
}
fclose(dyktando);
}
I tested function with txt file that contents only one character "ż". Can't read from file properly. At the start of main function I put these 2 lines:
system("chcp 852");
setlocale(LC_ALL, ".852");
I'm using codeblock, mingw32-gcc compiler, and no flags.
You are not using wchar_t compatible functions everywhere in your code. In particular:
fgets(&wyp->line, sizeof(dyk), dyktando); //reading from file and send to the list
The wchar_t compatible version is fgetws. Also, wyp->line (without the & operator) is the correct argument.
int n=strcmp(yt, wyp->line); //str compare
wcscmp should be used instead.
Also note that sizeof on a wchar_t array is not correct when a function expects length in characters rather than bytes (like fgetws does).
A comment OP (Amatheon) made indicates that the true underlying problem is how to properly read files using wide-character functions.
To ensure maximum compatibility and portability, let's restrict to C99. Consider the following example program:
#include <stdlib.h>
#include <locale.h>
#include <string.h>
#include <stdio.h>
#include <wchar.h>
#include <wctype.h>
#include <errno.h>
#ifdef USE_ERRNO_CONSTANTS
#define SET_ERRNO(value) (errno = (value))
#else
#define SET_ERRNO(value)
#endif
ssize_t get_wide_delimited(wchar_t **lineptr, size_t *sizeptr, wint_t delim, FILE *stream)
{
wchar_t *line = NULL;
size_t size = 0;
size_t used = 0;
wint_t wc;
if (!lineptr || !sizeptr || !stream) {
/* Invalid function parameters. NULL pointers are not allowed. */
SET_ERRNO(EINVAL);
return -1;
}
if (ferror(stream)) {
/* Stream is already in error state. */
SET_ERRNO(EIO);
return -1;
}
if (*sizeptr > 0) {
line = *lineptr;
size = *sizeptr;
} else {
*lineptr = NULL;
}
while (1) {
wc = fgetwc(stream);
if (wc == WEOF || wc == delim)
break;
if (used + 1 > size) {
/* Growth policy. We wish to allocate a chunk of memory at once,
so we don't need to do realloc() too often as it is a bit slow,
relatively speaking. On the other hand, we don't want to do
too large allocations, because that would waste memory.
Anything that makes 'size' larger than 'used' will work.
*/
if (used < 254)
size = 256;
else
if (used < 65536)
size = 2 * used;
else
size = (used | 65535) + 65521;
line = realloc(line, size * sizeof (wchar_t));
if (!line) {
/* Out of memory. */
SET_ERRNO(ENOMEM);
return -1;
}
*lineptr = line;
*sizeptr = size;
}
line[used++] = wc;
}
if (wc == WEOF) {
/* Verify that the WEOF did not indicate a read error. */
if (ferror(stream)) {
/* Read error. */
SET_ERRNO(EIO);
return -1;
}
}
/* Ensure there is enough room for the delimiter and end-of-string mark. */
if (used + 2 > size) {
/* We could reuse the reallocation policy here,
with the exception that the minimum is used + 2, not used + 1.
For simplicity, we use the minimum reallocation instead.
*/
size = used + 2;
line = realloc(line, size * sizeof (wchar_t));
if (!line) {
/* Out of memory. */
SET_ERRNO(ENOMEM);
return -1;
}
*lineptr = line;
*sizeptr = size;
}
/* Append the delimiter, unless end-of-stream mark. */
if (wc != WEOF)
line[used++] = wc;
/* Append the end-of-string nul wide char,
but do not include it in the returned length. */
line[used] = L'\0';
/* Success! */
return (ssize_t)used;
}
ssize_t get_wide_line(wchar_t **lineptr, size_t *sizeptr, FILE *stream)
{
return get_wide_delimited(lineptr, sizeptr, L'\n', stream);
}
int main(int argc, char *argv[])
{
wchar_t *line = NULL, *p;
size_t size = 0;
unsigned long linenum;
FILE *in;
int arg;
if (!setlocale(LC_ALL, ""))
fprintf(stderr, "Warning: Your C library does not support your current locale.\n");
if (fwide(stdout, 1) < 1)
fprintf(stderr, "Warning: Your C library does not support wide standard output.\n");
if (argc < 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
fprintf(stderr, " %s FILENAME [ FILENAME ... ]\n", argv[0]);
fprintf(stderr, "\n");
fprintf(stderr, "This program will output the named files, using wide I/O.\n");
fprintf(stderr, "\n");
return EXIT_FAILURE;
}
for (arg = 1; arg < argc; arg++) {
in = fopen(argv[arg], "r");
if (!in) {
fprintf(stderr, "%s: %s.\n", argv[arg], strerror(errno));
return EXIT_FAILURE;
}
if (fwide(in, 1) < 1) {
fprintf(stderr, "%s: Wide input is not supported from this file.\n", argv[arg]);
fclose(in);
return EXIT_FAILURE;
}
linenum = 0;
while (get_wide_line(&line, &size, in) > 0) {
linenum++;
/* We use another pointer to the line for simplicity.
We must not modify 'line' (except via 'free(line); line=NULL; size=0;'
or a similar reallocation), because it points to dynamically allocated buffer. */
p = line;
/* Remove leading whitespace. */
while (iswspace(*p))
p++;
/* Trim off the line at the first occurrence of newline or carriage return.
(The line will also end at the first embedded nul wide character, L'\0',
if the file contains any.) */
p[wcscspn(p, L"\r\n")] = L'\0';
wprintf(L"%s: Line %lu: '%ls', %zu characters.\n", argv[arg], linenum, p, wcslen(p));
}
if (ferror(in)) {
fprintf(stderr, "%s: Read error.\n", argv[arg]);
fclose(in);
return EXIT_FAILURE;
}
if (fclose(in)) {
fprintf(stderr, "%s: Delayed read error.\n", argv[arg]);
return EXIT_FAILURE;
}
wprintf(L"%s: Total %lu lines read.\n", argv[arg], linenum);
fflush(stdout);
}
free(line);
line = NULL;
size = 0;
return EXIT_SUCCESS;
}
Because the EINVAL, EIO, and ENOMEM errno constants are not defined in the C standards, the get_wide_line() and get_wide_delimited() only set errno if you define the USE_ERRNO_CONSTANTS preprocessor value.
The get_wide_line() and get_wide_delimited() are reimplementations of the getwline() and getwdelim() functions from ISO/IEC TR 24731-2:2010; the wide-character equivalents of the POSIX.1 getline() and getdelim() functions. Unlike fgets() or fgetws(), these use a dynamically allocated buffer to hold the line, so there is no fixed line length limits, other than available memory.
I've explicitly marked the code to be under Creative Commons Zero license: No Rights Reserved. It means you can use it in your own code, under whatever license you want.
Note: I would really love users to push their vendors and C standard committee members to get these included in the bog-standard C library part in the next version of the C standard. As you can see from above, they can be implemented in standard C already; it is just that the C library itself can do the same much more efficiently. The GNU C library is a perfect example of that (although even they are stalling with the implementation, because lack of standardization). Just think how many buffer overflow bugs would be avoided if people used getline()/getdelim()/getwline()/getwdelim() instead of fgets()/fgetws()! And avoid having to think about what the maximum reasonable line length in each instance would be to, too. Win-win!
(In fact, we could switch the return type to size_t, and use 0 instead of -1 as the error indicator. That would limit the changes to the text of the C standard to the addition of the four functions. It saddens and irritates me to no end, to have such a significant group of trivial functions so callously and ignorantly overlooked, for no sensible reason. Please, bug your vendors and any C standards committee members you have access to about this, as incessantly and relentlessly as you can manage. Both you and they deserve it.)
The essential parts of the program are
if (!setlocale(LC_ALL, ""))
This tells the C library to use the locale the user has specified.
Please, do not hardcode the locale value into your programs. In most operating systems, all you need to do is to change the LANG or LC_ALL environment variable to the locale you want to use, before running your program.
You might think that "well, I can hardcode it this time, because this is the locale used for this data", but even that can be a mistake, because new locales can be created at any time. This is particularly annoying when the character set part is hardcoded. For example, the ISO 8859 single-byte character set used in Western Europe is ISO 8859-15, not ISO 8859-1, because ISO 8859-15 has the € character in it, whereas ISO 8859-1 does not. If you have hardcoded ISO 8859-1 in your program, then it cannot correctly handle the € character at all.
if (fwide(stream, 1) < 1) for both stdout and file handles
While the C library does internally do an equivalent of the fwide() call based on which type of I/O function you use on the file handle the very first time, the explicit check is much better.
In particular, if the C library cannot support wide I/O to the file or stream represented by the handle, fwide() will return negative. (Unless the second parameter is also zero, it should never return zero; because of the issues in standardization, I recommend a strict return value check approach in this case, to catch vendors who decide to try to make life as difficult as possible for programmers trying to write portable code while technically still fulfilling the standard text, like Microsoft is doing. They even stuffed the C standard committee with their own representatives, so they could tweak C11 away from C99 features they didn't want to support, plus get a stamp of approval of their own nonstandard extensions nobody used before, to help create barriers for developers writing portable C code. Yeah, I don't trust their behaviour at all.)
ssize_t len = get_wide_line(&line, &size, handle);
If you initialize wchar_t *line = NULL; and size_t size = 0; prior to first call to get_wide_line() or get_wide_delimited(), the function will dynamically resize the buffer as needed.
The return value is negative if and only if an error occurs. (The functions should never return zero.)
When a line is read successfully, the return value reflects the number of wide characters in the buffer, including the delimiter (newline, L'\n' for get_wide_delimited()), and is always positive (greater than zero). The contents in the buffer will have a terminating end-of-wide-string character, L'\0', but it is not counted in the return value.
Note that when the delimiter is not L'\0', the buffer may contain embedded wide nul characters, L'\0'. In that case, len > wcslen(line).
The above example programs skips any leading whitespace on each input line, and trims off the line at the first linefeed (L'\n'), carriage return (L'\r'), or nul (L'\0'). Because of this, the return value len is only checked for success (a positive return value greater than zero).
free(line); line = NULL; size = 0;
It is okay to discard the line at any point its contents are no longer needed. I recommend explicitly setting the line pointer to NULL, and the size to zero, to avoid use-after-free bugs. Furthermore, that allows any following get_wide_line() or get_wide_delimited() to correctly dynamically allocate a new buffer.
ferror(handle) after a wide input function fails
Just like with narrow streams and EOF, there are two cases why wide input functions might return WEOF (or return -1, depending on the function): because there is no more input, or because a read error occurred.
There is no reason whatsoever to write computer programs that ignore read or write errors, without reporting them to the user. Sure, they are rare, but not so rare that a programmer can sanely expect them to never occur. (In fact, with Flash memory on flimsy circuits stored in weak plastic housings and subjected to human-sized stresses (I've sat on mine time and time again), the errors aren't that rare.) It is just evil, rather similar to food preparers being too lazy to wash their hands, causing fecal bacteria outbreaks every now and then. Don't be a fecal bacteria spreader equivalent programmer.
Let's say you have a harebrained lecturer who does not allow you to use the above get_wide_line() or get_wide_delimited() functions.
Don't worry. We can implement the same program using fgetws(), if we restrict line to some fixed upper limit (of wide characters). Lines longer than that will read as two or more lines instead:
#include <stdlib.h>
#include <locale.h>
#include <string.h>
#include <stdio.h>
#include <wchar.h>
#include <wctype.h>
#include <errno.h>
#ifndef MAX_WIDE_LINE_LEN
#define MAX_WIDE_LINE_LEN 1023
#endif
int main(int argc, char *argv[])
{
wchar_t line[MAX_WIDE_LINE_LEN + 1], *p;
unsigned long linenum;
FILE *in;
int arg;
if (!setlocale(LC_ALL, ""))
fprintf(stderr, "Warning: Your C library does not support your current locale.\n");
if (fwide(stdout, 1) < 1)
fprintf(stderr, "Warning: Your C library does not support wide standard output.\n");
if (argc < 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
fprintf(stderr, " %s FILENAME [ FILENAME ... ]\n", argv[0]);
fprintf(stderr, "\n");
fprintf(stderr, "This program will output the named files, using wide I/O.\n");
fprintf(stderr, "\n");
return EXIT_FAILURE;
}
for (arg = 1; arg < argc; arg++) {
in = fopen(argv[arg], "r");
if (!in) {
fprintf(stderr, "%s: %s.\n", argv[arg], strerror(errno));
return EXIT_FAILURE;
}
if (fwide(in, 1) < 1) {
fprintf(stderr, "%s: Wide input is not supported from this file.\n", argv[arg]);
fclose(in);
return EXIT_FAILURE;
}
linenum = 0;
while (1) {
/* If line is an array, (sizeof line / sizeof line[0]) evaluates to
the number of elements in it. This does not work if line is a pointer
to dynamically allocated memory. In that case, you need to remember
number of wide characters you allocated for in a separate variable,
and use that variable here instead. */
p = fgetws(line, sizeof line / sizeof line[0], in);
if (!p)
break;
/* Have a new line. */
linenum++;
/* Remove leading whitespace. */
while (iswspace(*p))
p++;
/* Trim off the line at the first occurrence of newline or carriage return.
(The line will also end at the first embedded nul wide character, L'\0',
if the file contains any.) */
p[wcscspn(p, L"\r\n")] = L'\0';
wprintf(L"%s: Line %lu: '%ls', %zu characters.\n", argv[arg], linenum, p, wcslen(p));
}
if (ferror(in)) {
fprintf(stderr, "%s: Read error.\n", argv[arg]);
fclose(in);
return EXIT_FAILURE;
}
if (fclose(in)) {
fprintf(stderr, "%s: Delayed read error.\n", argv[arg]);
return EXIT_FAILURE;
}
wprintf(L"%s: Total %lu lines read.\n", argv[arg], linenum);
fflush(stdout);
}
return EXIT_SUCCESS;
}
Aside from the function used to read each line, the difference is that instead of keeping the while loop condition as while ((p = fgetws(line, ...))) { ... }, I changed to the while (1) { p = fgetws(line, ...); if (!p) break; ... form that I believe is more readable.
I did deliberately show the longer, more complicated-looking one first, and this simpler one last, in the hopes that you would see that the more complicated-looking one actually has the simpler main() -- if we don't just count lines of code or something equally silly, but look at how many opportunities for mistakes there are.
As OP themselves wrote in a comment, the size of the buffer passed to fgets() or fgetws() is a real issue. There are rules of thumb, but they all suffer from being fragile against edits (especially the differences between arrays and pointers). With getline()/getdelim()/getwline()/getwdelim()/get_wide_line()/get_wide_delimited(), the rule of thumb is wchar_t *line = NULL; size_t size = 0; ssize_t len; and len = get_wide_line(&line, &size, handle);. No variations, and simple to remember and use. Plus it gets rid of any fixed limitations.

What causes vsprintf to throw a segmentation fault?

I am writing a simple wrapper for syslog to make logging from my program a bit easier and allow dumping log entries to the console when selected. I have the following log function defined
void logDebugFunction (int lineNumber, char* filename, const char* functionName, char* format, ...)
{
if (LOG_DEBUG >= argPtr->logLevel)
{
char buffer[1000];
char *entry;
va_list args;
va_start(args, format);
vsprintf(buffer, format, args);
va_end(args);
sprintf(entry, "%s:%d - %s - %s",filename, lineNumber, functionName, buffer);
syslog(LOG_MAKEPRI(0, (LOG_DEBUG)), "%s", entry);
if (argPtr->verbose)
{
// Print to stdout too
printf( "%s", entry);
printf("\n");
}
}
}
Which is called through the following macro:
#define logDebug(format,...) logDebugFunction(__LINE__, __FILE__, __func__, format, __VA_ARGS__)
From the main function, which is as follows:
int main(int argc, char *argv[])
{
// Set up syslog connection
openlog("ARController", LOG_CONS|LOG_PID|LOG_NDELAY, LOG_DAEMON);
// Set up our global arguments
struct arguments arguments;
argPtr = &arguments;
// set default values
arguments.verbose = 0;
arguments.foreground = 0;
arguments.logLevel = LOG_WARNING;
// Send a test debug message
logDebug("Test Debug message %d %s", 5, "a string");
// Close our syslog connection
closelog();
}
Now, when I try to run the only output I get is Segmentation fault (core dumped), obviously not what I want.
I've done some investigation using gdb and the --save-temps flag to verify the following:
In main.i I can see that the logDebug call in main has been replaced with logDebugFunction(72, "src/main.c", __func__, "Test Debug message %d %s", 5, "a string"); which is what I'd expect to see here.
When running, the segfault happens at the first vsprintf line in logDebugFunction
Just before the call to vsprintf all the mandatory arguments of the function are correct:
Breakpoint 2, logDebugFunction (lineNumber=72, filename=0x401450 "src/main.c", functionName=0x4014d3 <__func__.4035> "main", format=0x401437 "Test Debug message %d %s")
The va_list entries are what I'd expect them to be as shown by the following gdb commands (found here)
(gdb) p *(int *)(((char*)args[0].reg_save_area)+args[0].gp_offset)
$5 = 5
(gdb) p *(char * *)(((char*)args[0].reg_save_area)+args[0].gp_offset+8)
$6 = 0x40142e "a string"
When I step into the vsprintf call it seems like the arguments are correct: __IO_vsprintf (string=0x7ffffffedb40 "\200V", format=0x401437 "Test Debug message %d %s", args=0x7ffffffedb28) at iovsprintf.c:32`
So as everything seems to be in order I'm a bit lost as to what the issue is and what steps I can take next.
I don't see anything wrong (ignoring that there are no sanity checks) with the way you use va_list & vsprintf, so it could be that it needs more than 1000 charcaters and buffer is simply not large enough or your passing the argumnts in the wrong way? Have you tried using vprintf for debug purposes?
But I see a definitive problem in the next lines:
char *entry;
...
sprintf(entry, "%s:%d - %s - %s",filename, lineNumber, functionName, buffer);
entry is a unitialized pointer, pointing to nowhere. If you try to read/write through that pointer, then you get an undefined behaviour. A segfault is the result of that.
With snprintf you can get the length of the expression and then with malloc dynamically allocate memory for it (fon't forget to free it afterwards). Or you can do
char entry[1024];
...
sprintf(entry, "%s:%d - %s - %s",filename, lineNumber, functionName, buffer);
assuming that no entry will be longer than 1023 characters.
EDIT request from the comment to elaborate on getting length from snprintf
Let's start with the signature of the function
#include <stdio.h>
int snprintf(char *str, size_t size, const char *format, ...);
The man page description of says:
man page printf(3)
The functions snprintf() and vsnprintf() write at most size bytes
(including the terminating null byte ('\0')) to str.
If you want to just get the length, set size to 0 and str to NULL
int msglen = snprintf(NULL, 0, fmt, exp1, exp2, exp3,...);
Bear in mind that this behaviour is conform to C99. Compiling with an older compilier or older C standard might give you unspecified return value.
there is no checks that format does match passed arguments (see __attribute__ ((format (printf);
there are no checks that pointers are not null;
there is no check that buffer is large enough to hold the given string (use functions taking buffer size such as snprintf);
sprintf(entry, uses uninitialized variable entry instead of suitable buffer causing Undefined Behavior, attempt to write at random location pointed to by entry is the most likely reason for a segfault.
In my case I encountered this when I accidentally returned in a function that was marked _Noreturn in a header (but not in function itself) when writing C11.
This mistake did not cause a compilation error, didn't emit a warning (with -Wall) and wasn't caught by neither address sanitizer (asan) or thread sanitizer (tsan), but code execution after that return was bonkers and it gave me misleading call traces.

Not null terminated string - a KlocWork error with no understandable reason

I've recently installed "klocwork" and am trying to get rid of bugs on an existing code.
The error shown seems to be simple. No null at the termination of the char * _p_.
I have manually added a null termination (even though there is no need), but it doesn't please the Klocwork. Any ideas?
The exact message is:-
Incorrectly terminated string 'p' causes a buffer overflow in p.
char *ptr;
int writtenchars = 0 ;
va_list args;
char* destStr;
if (argc != 2) {
printf(" wrong parameters number - %d instead of %d\n", argc, 2);
char str[25]="wrong parameters number ";
char *_p_; /********************************************************/
va_start(args, str);
destStr = (char*) malloc(SNMP_BUF_LEN);
_p_= destStr;
if (destStr == NULL) {
printf("WARNING: Failed to alloc memory in in function \"snmp_rebuildstringinbuf!!!\" \n");
destStr="kukuRiko";
}
else {
writtenchars = (int) vsnprintf(destStr, 4095, str, args);
if (writtenchars>SNMP_BUF_LEN) {
printf("WARNING: Too long string rebuilded in function \"snmp_rebuildstringinbuf!!!\" %d chars\n",writtenchars);
}
destStr[writtenchars] = '\0' ; //Moshe - making sure the last value of the string is null terminated in order to prevent future buffer overflows.
}
va_end(args);
/******************************************************************************/
//The KlocWork error relates to this line //
logCWriteLog_msg(moduleId, level, __FILE__, __LINE__, _p_, ltrue);
free (_p_);
===========================================================
Hi Guys,
Thanks for your answers, but it seems a bit more obscure than that. I have refined the code to this simple case:-
When the code is written all in one function there is no error, whereas, when the allocation section is wrapped in a function (and a text passed as parameter) the Klocwork error returns.
See this code:- version without an error:-
char *_p_; /*+++++++++++++++++++*/
int writtenchars = 0 ;
va_list args;
char* destStr;
char* str = "hello World";
va_start(args, str);
destStr = (char*)malloc(SNMP_BUF_LEN);
if (destStr == NULL) {
printf("WARNING: Failed to alloc memory in function \n");
}
else {
writtenchars = (int) vsnprintf(destStr, (SNMP_BUF_LEN) - 1, str, args);
}
/*+++++++++++++++++++*/
_p_ = destStr ;
if (_p_ != NULL) {
logCWriteLog_msg(moduleId, level, __FILE__, __LINE__, _p_, ltrue);
}
free (_p_);
/***********************************************************/
whereas when taking the code between /*++++ */ and wrapping it in a function returns the above KlocWork error.
Hence,
char *writingToSomeBuffer (char * str) {
int writtenchars = 0 ;
va_list args;
char* destStr;
va_start(args, str);
destStr = (char*)malloc(SNMP_BUF_LEN);
if (destStr == NULL) {
printf("WARNING: Failed to alloc memory in function \n");
}
else {
writtenchars = (int) vsnprintf(destStr, (SNMP_BUF_LEN) - 1, str, args);
}
return destStr;
}
int main () {
char *_p_;
_p_ = writingToSomeBuffer("hello world");
if (_p_ != NULL) {
logCWriteLog_msg(moduleId, level, __FILE__, __LINE__, _p_, ltrue);
}
free (_p_);
return 0 ;
}
any ideas?
KlocWork is correctly diagnosing the problem that you can be writing with a null pointer if memory allocation fails:
_p_= destStr;
if (destStr == NULL)
{
printf("WARNING: Failed to alloc memory in in function ...\n");
destStr = "kukuRiko";
At this point, the (horribly named) '_p_' variable is still null, but you go ahead and use it in the printing operation below.
Also note that the 'trivial' fix of adding '_p_' after this breaks your memory management; you later do 'free(_p_);' which will lead to horrible problems if '_p_' points to the constant string.
You also have 'memory in in function' in the message. And 'wrong parameters number' does mean roughly the same as 'wrong number of parameters' but the latter is more idiomatic English. I'm not convinced any of the exclamation marks are helpful in the error message; there is a strong argument that they should go outside the double quotes surrounding the function name even if one of them is deemed desirable.
With the revised version of the problem, I wonder if Klocwork is diagnosing what Microsoft says of its vsnprintf(), that it does not guarantee null termination (which is different from what C99 and POSIX says).
Jonathan has it right. We've recently broken up this checker into two families that might explain it better:
http://www.klocwork.com/products/documentation/Insight-9.1/Checkers:NNTS.MIGHT
http://www.klocwork.com/products/documentation/Insight-9.1/Checkers:NNTS.MUST
We are currently under development to clean this up and make it easier to understand. Not only the problem but the solution as well.
Klocwork's error aside, I think this code is wrong. Why are you limiting the vsnprintf to 4096, while the buffer size is SNMP_BUF_LEN? How do those two related to each other? If SNMP_BUF_LEN < 4096, then you may have just overflowed your buffer. Why wouldn't you pass SNMP_BUF_LEN as the limiting argument in vsnprintf?
Also, the write to destStr[writtenchars] is suspect. Depending on the variant of vsnprintf (they do vary), writtenchars might be the number of characters it wanted to write, which would again cause you to write past the end of your buffer.
That all said, Klocwork isn't perfect. We had macros that were very explicitly trying to be safe, and Klocwork mis-detected them as potentially overrunning the string. I think that was a snprintf case as well.
Overall a good product, but it does have a few holes and you can't fix all it's complaints.

How can I temporarily redirect printf output to a c-string?

I'm writing an assignment which involves adding some functionality to PostgreSQL on a Solaris box. As part of the assignment, we need to print some information on the client side (i.e.: using elog.)
PostgreSQL already has lots of helper methods which print out the required information, however, the helper methods are packed with hundreds of printf calls, and the elog method only works with c-style strings.
Is there I way that I could temporarily redirect printf calls to a buffer so I could easily send it over elog to the client?
If that's not possible, what would be the simplest way to modify the helper methods to end up with a buffer as output?
If you define your own version of printf and link to it prior to the libc version, your version will supersede the standard version.
You should also be able to supersede the standard version by using LD_PRELOAD to load a library that has printf defined.
To write your own printf, you will want to use stdarg functionality:
int printf(const char *fmt, ...)
{
int rv;
va_list ap;
va_start(ap, fmt);
if (redirect_printf)
{
#ifdef HAVE_VLOG
// If you have a vlog function that takes a va_list
vlog(fmt, ap);
rv = ...;
#else
char buffer[LARGESIZE];
rv = vsnprintf(buffer, sizeof(buffer), fmt, ap);
log(buffer);
#endif;
}
else
{
rv = vprintf(fmt, ap);
}
return rv;
}
This simple version will truncate data when the final formatted output is greater than LARGESIZE. If you don't want that, you can also call vsnprintf first with a NULL buffer to get the final size, do a dynamic allocation and then a second call to vsprintf to format the buffer.
You're wrong — elog supports format strings just like printf. Here's an example from Postgres source code:
elog(DEBUG4, "TZ \"%s\" gets max score %d", tzname, i);
So all you need is to add elog where there is printf using the same parameters.
The simplest way is to modify the helper methods to call sprintf(). Whether or not you can hack that in easily, I don't know. Maybe
#define printf(...) sprintf(buffer, __VA_ARGS__)
Will do it for you. You'll still need to define buffer for each helper function, and get its contents returned to whoever cares about them.
If you can tolerate the use of a temporary file you could redirect standard out with the freopen() call:-
newstdout = freopen("/tmp/log", "w", stdout);
This will force all the printf's to be written to /tmp/log instead of the console output. At some convenient point later in your program you could then open the same file for reading:-
readfd = fopen("/tmp/log", "r");
and forward the contents that have been added using something like this:-
void forward_to_elog(void)
{
int bytesread;
char buf[100];
memset(buf,0,100);
do {
memset(buf,0,100);
bytesread = fread(buf, sizeof(buf)-1, 1, readfd);
/* call elog(buf) */ ;
} while(bytesread);
}
If you keep the file open you can call forward_to_elog() multiple times to incrementally forward the contents that have been added.
The tmpnam() function can be used to get a name for the temporary file if you don't want to have to statically code one.

Resources