Why does C not have an snwprintf function? - c

Does anyone know why there is no snwprintf function in the C standard library?
I am aware of swprintf, but that doesn't have the same semantics of a true, wchar_t version of snprintf. As far as I can tell, there is no easy way to implement an snwprintf function using [v]swprintf:
Unlike snprintf, swprintf does not return the necessary buffer size; if the supplied buffer is insufficient, it simply returns -1. This is indistinguishable from failure due to encoding errors, so I can't keep retrying with progressively larger buffers hoping that it eventually will succeed.
I suppose I could set the last element of the buffer to be non-NUL, call swprintf, and assume that truncation occurred if that element is NUL afterward. However, is that guaranteed to work? The standard does not specify what state the buffer should be in if swprintf fails. (In contrast, snprintf describes which characters are written and which are discarded.)

See the answer given by Larry Jones here.
Essentially, swprintf was added in C95 while snprintf was added in C99 and since many implementations already returned the number of required characters (for snprintf) and it seemed a useful thing to do, that was the behavior that was standardized. They didn't think that behavior was important enough to break backwards compatibility with swprintf by adding it (which was added without that behavior several years earlier).

Related

Why the different behavior of snprintf vs swprintf?

The C standard states the following from the standard library function snprintf:
"The snprintf function is equivalent to fprintf, except that the
output is written into an array (specified by arguments) rather than
to a stream. If n is zero, nothing is written, and s may be a null
pointer. Otherwise, output characters beyond the n-1st are discarded
rather than being written to the array, and a null character is
written at the end of the characters actually written into the array.
If copying takes place between objects that overlap, the behavior is
undefined."
"The snprintf function returns the number of characters that would
have been written had n been sufficiently large, not counting the
terminating null character, or a negative value if an encoding error
occurred. Thus, the null-terminated output has been completely written
if and only if the returned value is nonnegative and less than n."
Compare it to the statement about swprintf:
"The swprintf function is equivalent to fwprintf, except that the
argument s specifies an array of wide characters into which the
generated output is to be written, rather than written to a stream. No
more than n wide characters are written, including a terminating null
wide character, which is always added (unless n is zero)."
"The swprintf function returns the number of wide characters written
in the array, not counting the terminating null wide character, or a
negative value if an encoding error occurred or if n or more wide
characters were requested to be written."
At first glance it may seem like snprintf and swprintf are complete equivalent to each other, the latter merely handling wide strings and the former narrow strings. However, that's not the case. While snprintf returns the number of characters that would have been written if n had been large enough, swprintf returns a negative value in this case (which means that you can't know how many characters would have been written if there had been enough space). This makes the two functions not fully interchangeable, because their behavior is different in this regard (and thus the latter can't be used for some thing that the former can, such as evaluating how long the output buffer would need to be, before actually creating it.)
Why would they make this difference? I suppose the behavior of swprintf makes the implementation more efficient when n is too small, but still, why the difference? I don't think it's even a question of snprintf being older and thus "legacy" and "dragging the weight of its history, which can't be changed later" and swprintf being newer and thus free to be improved, because both were introduced in C99.
There is, however, another significantly subtler difference between the two specifications. If you notice, the specifications are not merely carbon-copies of each other, with the only difference being the return value. That's another, much subtler difference, and that's the somewhat ambiguous behavior of what happens if n is too small for the string-to-be-printed.
The specification for snprintf quite clearly states that the output will be written up to n-1 characters even when n is too small, and the null character will be written at the end of it always. The specification of swprintf almost states this... except it leaves it ambiguous in its specification of the return value.
More specifically a negative return value is used to signal that an error occurred while trying to write the string to the destination. A negative value is also returned when n was too small. It's left ambiguous whether this is actually considered an error situation or not. This is significant because if it's considered an error, then the implementation is free to not write all, or anything, into the destination, because it can be signaling "an error occurred, the output is invalid". The first paragraph of the specification makes it sound like at most n-1 characters are always written, and an ending null character is always written ("which is always added"), but the second paragraph about the return value leaves it ambiguous whether this is actually an error situation and whether the implementation can choose not to write those things in this case.
This is significant because the glibc implementation of swprintf does not write the final null character when n is too small, making the result an invalid string. While I can't find definitive information on this, I have got the impression that the developers of glibc have interpreted the standard in such a manner that they don't have to write the final null character (or anything) to the output because this is an error situation.
The thing is that the standard seems to be very ambiguous and vague in this regard. Is it a correct interpretation? Or are they misinterpreting? Why would the standard leave it this ambiguous?
My interpretation differs from that of the glibc developers. I understand the second paragraph to mean:
A negative value is returned if:
an encoding error occurred, or
n or more wide characters were requested to be written.
I don't see how this could be interpreted as n being too small being considered an error.

Does sscanf require a null terminated string as input?

A recently discovered explanation for GTA lengthy load times(1) showed that many implementations of sscanf() call strlen() on their input string to set up a context object for an internal routine shared with other scanning functions (scanf(), fscanf()...). This can become a performance bottleneck when the input string is very long. Parsing a 10MB JSON file loaded as a string with repeated calls to sscanf() with an offset and a %n conversion proved to be a dominant cause for the load time.
My question is should sscanf() even read the input string beyond the bytes necessary for the conversions to complete? For example does the following code invoke undefined behavior:
int test(void) {
char buf[1] = { '1' };
int v;
sscanf(buf, "%1d", &v);
return v;
}
The function should return 1 and does not need to read more than one byte from buf, but is sscanf() allowed to read from buf beyond the first byte?
(1) references provided by JdeBP:
https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times-by-70/
https://news.ycombinator.com/item?id=26297612
https://github.com/biojppm/rapidyaml/issues/40
Here are the relevant parts from the C Standard:
7.21.6.7 The sscanf function Synopsis
Synopsis
#include <stdio.h>
int sscanf(const char * restrict s, const char * restrict format, ...);
Description
The sscanf function is equivalent to fscanf, except that input is obtained from a string (specified by the argument s) rather than from a stream. Reaching the end of the string is equivalent to encountering end-of-file for the fscanf function. If copying takes place between objects that overlap, the behavior is undefined.
Returns
The sscanf function returns the value of the macro EOF if an input failure occurs before the first conversion (if any) has completed. Otherwise, the sscanf function returns the number of input items assigned, which can be fewer than provided for, or even zero, in the event of an early matching failure.
The input is specifically referred to as a string, so it should be null terminated
Albeit none of the characters in the string beyond the initial prefix that matches the conversion specifier and potentially the next byte that helped determine the end of the matching sequence are used for the conversion, these characters must be followed by a null terminator so the input is a well formed string, and it is conforming to call strlen() on it to determine the input length.
To avoid linear time complexity on long input strings, sscanf() should limit the scan for the end of string to a small size with strnlen() or equivalent and pass an appropriate refill function. Passing a huge length and letting the internal routine special case the null byte is an even better approach.
In the mean time, programmers should avoid passing long input strings to sscanf() and use more specialized functions for their parsing tasks, such as strtol(), which also requires a well formed C string, but is implemented in a more conservative way. This would also avoid potential undefined behavior on number conversions for out of range string representations.
When the Standard was written, many library functions were handled identically by almost all existing implementations, but some implementations may have had good reasons for handling a few cases differently. If the number of implementations that would have reason to differ from the commonplace behavior was substantial, then the Committee would either require that all implementations behave in the common fashion (as happens when e.g. computing UINT_MAX+1u), or explicitly state that they were not required to do so (as when e.g. computing INT_MAX+1). In cases where there was a clear common behavior, but it might not be practical on all implementations, however, the Committee generally simply refrained from saying anything, on the presumption that most compilers would have no reason to deviate from the common behavior, and the authors of those that would have reason to deviate would be better placed than the Committee to judge the pros and cons of following the common behavior versus deviating from it.
The sscanf behavior at issue fits the latter pattern. The Committee didn't want to mandate that implementations which would have trouble if the data source didn't have a trailing zero byte must be changed to deal with such data sources, but nor did they want to require that programmers copy data from sources that don't have trailing zero bytes to places that do before using sscanf upon it even when their implementation wouldn't care about anything beyond the portion of the source that would be meaningfully examined. Since makers of implementations that require a trailing zero will likely block any change to the Standard that would require them to tolerate its absence, and programmers whose implementations that impose no such needless requirements will block any change to the Standard that would require that they add extra data-copying steps to their code, the situation will remain deadlocked unless people can agree to categorize implementations that impose the trailing-byte requirement as "conforming but deficient" and require that they indicate such deficiency via predefined macro or other such means.

Unsafe C functions in HP-UX Environment

We are developing one scheduler application in the C Programming language. We are using the HP-UX environment to compile and deploy the code. During the yearly external audit of application, we received one report that contains following number of observations.
Dangerous functions: strcpy, strlen, strcat etc.
Buffer overflow: memcpy
Buffer overflow format string: sprintf, snprintf etc.
Format string: printf, sprintf etc.
They also give the general recommendation — Contains some safe functions that is:
strncpy_s
strnlen_s
strncat_s
memcpy_s etc..
Now, the problem is there no such library available for HP-UX environment. Above given functions are supported only in the Windows environment.
Is there any alternative available for dangerous functions in Linux environment?
How we can mitigate buffer-overflow format string and format string category?
See Do you use the TR 24731 'safe' functions? for a discussion of the demerits of the _s functions.
Functions such as strcpy() are safe if (and only if) you know how big the source string and the target strings are. If you don't know, you're playing with fire.
Buffer overflows with memcpy() are outright bugs in your program; you can't use it reliably if you don't know the sizes, or that the buffers do not overlap (memmove() is safer; it handles overlaps). There's an argument to say "you don't need strcpy() or strcat() etc because if you have enough data to use them safely, you can use memmove() or memcpy() instead". On the whole, strlen() is pretty safe — as long as you pass it a string. If you don't know whether you're dealing with strings, then you've got lots of problems; you must know that you're dealing with strings to call the string manipulation functions.
Note that the strncpy() and strncat() functions are not safe. The problem with strncpy() is that it does not null terminate the string if the source is too long. The problem with strncat() is that passing sizeof(dst) as the size of the destination is wrong, even if the string is empty; it has one of the weirdest, most bug-prone interfaces of any extant C function — gets() is no longer counted as extant. If you know the sizes of everything, you don't need them. If you don't know the sizes, using them won't make you safe.
Using sprintf() is unnecessarily dangerous; using snprintf() should be safe as long as you get the size correct and pay attention to data truncation by testing the return value. Check to see whether asprintf() and vasprintf() are available — and consider using them if they are.
Format string vulnerabilities arise where you have:
printf(fmtstr, value1, value2);
where the fmtstr argument can be controlled or influenced by the user. If you can determine where the format string comes from and know it is safe, then there isn't a problem, and it can help with the internationalization of your code. If you can't determine that the format string is safe, you are running risks. How serious those risks are depends on the context in which it is used. If the user root will be running the code, which seems likely for a scheduler, then you must be meticulous. You may be able to be a little more blasé if the users running the code will not be root, but it is difficult to ensure that no-one ever runs the code as root.
You're right that the _s functions are not available except on Windows. The external auditors have been downright unhelpful — suggesting the use of functions that are not available on the target platform is counter-productive. There is room to debate whether using the _s functions is sufficient, Microsoft notwithstanding. They can be misused, just as any function can. See the N1967 paper referenced in my answer to the TR 24731 question. (There are later papers available from the C standard committee's web site at http://www.open-std.org/jtc1/sc22/wg14/ which don't entirely agree with N1967 — N2336 from the Pre-London 2019 mailing, for example. I'm not sure I entirely agree with N2336.)
Consider whether strlcpy() and strlcat() are available and could/should be used for strcpy(), strncpy(), strcat(), strncat().

Which functions from the standard library must (should) be avoided?

I've read on Stack Overflow that some C functions are "obsolete" or "should be avoided". Can you please give me some examples of this kind of function and the reason why?
What alternatives to those functions exist?
Can we use them safely - any good practices?
Deprecated Functions
Insecure
A perfect example of such a function is gets(), because there is no way to tell it how big the destination buffer is. Consequently, any program that reads input using gets() has a buffer overflow vulnerability. For similar reasons, one should use strncpy() in place of strcpy() and strncat() in place of strcat().
Yet some more examples include the tmpfile() and mktemp() function due to potential security issues with overwriting temporary files and which are superseded by the more secure mkstemp() function.
Non-Reentrant
Other examples include gethostbyaddr() and gethostbyname() which are non-reentrant (and, therefore, not guaranteed to be threadsafe) and have been superseded by the reentrant getaddrinfo() and freeaddrinfo().
You may be noticing a pattern here... either lack of security (possibly by failing to include enough information in the signature to possibly implement it securely) or non-reentrance are common sources of deprecation.
Outdated, Non-Portable
Some other functions simply become deprecated because they duplicate functionality and are not as portable as other variants. For example, bzero() is deprecated in favor of memset().
Thread Safety and Reentrance
You asked, in your post, about thread safety and reentrance. There is a slight difference. A function is reentrant if it does not use any shared, mutable state. So, for example, if all the information it needs is passed into the function, and any buffers needed are also passed into the function (rather than shared by all calls to the function), then it is reentrant. That means that different threads, by using independent parameters, do not risk accidentally sharing state. Reentrancy is a stronger guarantee than thread safety. A function is thread safe if it can be used by multiple threads concurrently. A function is thread safe if:
It is reentrant (i.e. it does not share any state between calls), or:
It is non-reentrant, but it uses synchronization/locking as needed for shared state.
In general, in the Single UNIX Specification and IEEE 1003.1 (i.e. "POSIX"), any function which is not guaranteed to be reentrant is not guaranteed to be thread safe. So, in other words, only functions which are guaranteed to be reentrant may be portably used in multithreaded applications (without external locking). That does not mean, however, that implementations of these standards cannot choose to make a non-reentrant function threadsafe. For example, Linux frequently adds synchronization to non-reentrant functions in order to add a guarantee (beyond that of the Single UNIX Specification) of threadsafety.
Strings (and Memory Buffers, in General)
You also asked if there is some fundamental flaw with strings/arrays. Some might argue that this is the case, but I would argue that no, there is no fundamental flaw in the language. C and C++ require you to pass the length/capacity of an array separately (it is not a ".length" property as in some other languages). This is not a flaw, per-se. Any C and C++ developer can write correct code simply by passing the length as a parameter where needed. The problem is that several APIs that required this information failed to specify it as a parameter. Or assumed that some MAX_BUFFER_SIZE constant would be used. Such APIs have now been deprecated and replaced by alternative APIs that allow the array/buffer/string sizes to be specified.
Scanf (In Answer to Your Last Question)
Personally, I use the C++ iostreams library (std::cin, std::cout, the << and >> operators, std::getline, std::istringstream, std::ostringstream, etc.), so I do not typically deal with that. If I were forced to use pure C, though, I would personally just use fgetc() or getchar() in combination with strtol(), strtoul(), etc. and parse things manually, since I'm not a huge fan of varargs or format strings. That said, to the best of my knowledge, there is no problem with [f]scanf(), [f]printf(), etc. so long as you craft the format strings yourself, you never pass arbitrary format strings or allow user input to be used as format strings, and you use the formatting macros defined in <inttypes.h> where appropriate. (Note, snprintf() should be used in place of sprintf(), but that has to do with failing to specify the size of the destination buffer and not the use of format strings). I should also point out that, in C++, boost::format provides printf-like formatting without varargs.
Once again people are repeating, mantra-like, the ludicrous assertion that the "n" version of str functions are safe versions.
If that was what they were intended for then they would always null terminate the strings.
The "n" versions of the functions were written for use with fixed length fields (such as directory entries in early file systems) where the nul terminator is only required if the string does not fill the field. This is also the reason why the functions have strange side effects that are pointlessly inefficient if just used as replacements - take strncpy() for example:
If the array pointed to by s2 is a
string that is shorter than n bytes,
null bytes are appended to the copy in
the array pointed to by s1, until n
bytes in all are written.
As buffers allocated to handle filenames are typically 4kbytes this can lead to a massive deterioration in performance.
If you want "supposedly" safe versions then obtain - or write your own - strl routines (strlcpy, strlcat etc) which always nul terminate the strings and don't have side effects. Please note though that these aren't really safe as they can silently truncate the string - this is rarely the best course of action in any real-world program. There are occasions where this is OK but there are also many circumstances where it could lead to catastrophic results (e.g. printing out medical prescriptions).
Several answers here suggest using strncat() over strcat(); I'd suggest that strncat() (and strncpy()) should also be avoided. It has problems that make it difficult to use correctly and lead to bugs:
the length parameter to strncat() is related to (but not quite exactly - see the 3rd point) the maximum number of characters that can be copied to the destination rather than the size of the destination buffer. This makes strncat() more difficult to use than it should be, particularly if multiple items will be concatenated to the destination.
it can be difficult to determine if the result was truncated (which may or may not be important)
it's easy to have an off-by-one error. As the C99 standard notes, "Thus, the maximum number of characters that can end up in the array pointed to by s1 is strlen(s1)+n+1" for a call that looks like strncat( s1, s2, n)
strncpy() also has an issue that can result in bugs you try to use it in an intuitive way - it doesn't guarantee that the destination is null terminated. To ensure that you have to make sure you specifically handle that corner case by dropping a '\0' in the buffer's last location yourself (at least in certain situations).
I'd suggest using something like OpenBSD's strlcat() and strlcpy() (though I know that some people dislike those functions; I believe they're far easier to use safely than strncat()/strncpy()).
Here's a little of what Todd Miller and Theo de Raadt had to say about problems with strncat() and strncpy():
There are several problems encountered when strncpy() and strncat() are used as safe versions of strcpy() and strcat(). Both functions deal with NUL-termination and the length parameter in different and non-intuitive ways that confuse even experienced programmers. They also provide no easy way to detect when truncation occurs. ... Of all these issues, the confusion caused by the length parameters and the related issue of NUL-termination are most important. When we audited the OpenBSD source tree for potential security holes we found rampant misuse of strncpy() and strncat(). While not all of these resulted in exploitable security holes, they made it clear that the rules for using strncpy() and strncat() in safe string operations are widely misunderstood.
OpenBSD's security audit found that bugs with these functions were "rampant". Unlike gets(), these functions can be used safely, but in practice there are a lot of problems because the interface is confusing, unintuitive and difficult to use correctly. I know that Microsoft has also done analysis (though I don't know how much of their data they may have published), and as a result have banned (or at least very strongly discouraged - the 'ban' might not be absolute) the use of strncat() and strncpy() (among other functions).
Some links with more information:
http://www.usenix.org/events/usenix99/full_papers/millert/millert_html/
http://en.wikipedia.org/wiki/Off-by-one_error#Security_implications
http://blogs.msdn.com/michael_howard/archive/2004/10/29/249713.aspx
http://blogs.msdn.com/michael_howard/archive/2004/11/02/251296.aspx
http://blogs.msdn.com/michael_howard/archive/2004/12/10/279639.aspx
http://blogs.msdn.com/michael_howard/archive/2006/10/30/something-else-to-look-out-for-when-reviewing-code.aspx
Standard library functions that should never be used:
setjmp.h
setjmp(). Together with longjmp(), these functions are widely recogniced as incredibly dangerous to use: they lead to spaghetti programming, they come with numerous forms of undefined behavior, they can cause unintended side-effects in the program environment, such as affecting values stored on the stack. References: MISRA-C:2012 rule 21.4, CERT C MSC22-C.
longjmp(). See setjmp().
stdio.h
gets(). The function has been removed from the C language (as per C11), as it was unsafe as per design. The function was already flagged as obsolete in C99. Use fgets() instead. References: ISO 9899:2011 K.3.5.4.1, also see note 404.
stdlib.h
atoi() family of functions. These have no error handling but invoke undefined behavior whenever errors occur. Completely superfluous functions that can be replaced with the strtol() family of functions. References: MISRA-C:2012 rule 21.7.
string.h
strncat(). Has an awkward interface that are often misused. It is mostly a superfluous function. Also see remarks for strncpy().
strncpy(). The intention of this function was never to be a safer version of strcpy(). Its sole purpose was always to handle an ancient string format on Unix systems, and that it got included in the standard library is a known mistake. This function is dangerous because it may leave the string without null termination and programmers are known to often use it incorrectly. References: Why are strlcpy and strlcat considered insecure?, with a more detailed explanation here: Is strcpy dangerous and what should be used instead?.
Standard library functions that should be used with caution:
assert.h
assert(). Comes with overhead and should generally not be used in production code. It is better to use an application-specific error handler which displays errors but does not necessarily close down the whole program.
signal.h
signal(). References: MISRA-C:2012 rule 21.5, CERT C SIG32-C.
stdarg.h
va_arg() family of functions. The presence of variable-length functions in a C program is almost always an indication of poor program design. Should be avoided unless you have very specific requirements.
stdio.h
Generally, this whole library is not recommended for production code, as it comes with numerous cases of poorly-defined behavior and poor type safety.
fflush(). Perfectly fine to use for output streams. Invokes undefined behavior if used for input streams.
gets_s(). Safe version of gets() included in C11 bounds-checking interface. It is preferred to use fgets() instead, as per C standard recommendation. References: ISO 9899:2011 K.3.5.4.1.
printf() family of functions. Resource heavy functions that come with lots of undefined behavior and poor type safety. sprintf() also has vulnerabilities. These functions should be avoided in production code. References: MISRA-C:2012 rule 21.6.
scanf() family of functions. See remarks about printf(). Also, - scanf() is vulnerable to buffer overruns if not used correctly. fgets() is preferred to use when possible. References: CERT C INT05-C, MISRA-C:2012 rule 21.6.
tmpfile() family of functions. Comes with various vulnerability issues. References: CERT C FIO21-C.
stdlib.h
malloc() family of functions. Perfectly fine to use in hosted systems, though be aware of well-known issues in C90 and therefore don't cast the result. The malloc() family of functions should never be used in freestanding applications. References: MISRA-C:2012 rule 21.3.
Also note that realloc() is dangerous in case you overwrite the old pointer with the result of realloc(). In case the function fails, you create a leak.
system(). Comes with lots of overhead and although portable, it is often better to use system-specific API functions instead. Comes with various poorly-defined behavior. References: CERT C ENV33-C.
string.h
strcat(). See remarks for strcpy().
strcpy(). Perfectly fine to use, unless the size of the data to be copied is unknown or larger than the destination buffer. If no check of the incoming data size is done, there may be buffer overruns. Which is no fault of strcpy() itself, but of the calling application - that strcpy() is unsafe is mostly a myth created by Microsoft.
strtok(). Alters the caller string and uses internal state variables, which could make it unsafe in a multi-threaded environment.
Some people would claim that strcpy and strcat should be avoided, in favor of strncpy and strncat. This is somewhat subjective, in my opinion.
They should definitely be avoided when dealing with user input - no doubt here.
In code "far" from the user, when you just know the buffers are long enough, strcpy and strcat may be a bit more efficient because computing the n to pass to their cousins may be superfluous.
Avoid
strtok for multithreaded programs as its not thread-safe.
gets as it could cause buffer overflow
It is probably worth adding again that strncpy() is not the general-purpose replacement for strcpy() that it's name might suggest. It is designed for fixed-length fields that don't need a nul-terminator (it was originally designed for use with UNIX directory entries, but can be useful for things like encryption key fields).
It is easy, however, to use strncat() as a replacement for strcpy():
if (dest_size > 0)
{
dest[0] = '\0';
strncat(dest, source, dest_size - 1);
}
(The if test can obviously be dropped in the common case, where you know that dest_size is definitely nonzero).
Also check out Microsoft's list of banned APIs. These are APIs (including many already listed here) that are banned from Microsoft code because they are often misused and lead to security problems.
You may not agree with all of them, but they are all worth considering. They add an API to the list when its misuse has led to a number of security bugs.
It is very hard to use scanf safely. Good use of scanf can avoid buffer overflows, but you are still vulnerable to undefined behavior when reading numbers that don't fit in the requested type. In most cases, fgets followed by self-parsing (using sscanf, strchr, etc.) is a better option.
But I wouldn't say "avoid scanf all the time". scanf has its uses. As an example, let's say you want to read user input in a char array that's 10 bytes long. You want to remove the trailing newline, if any. If the user enters more than 9 characters before a newline, you want to store the first 9 characters in the buffer and discard everything until the next newline. You can do:
char buf[10];
scanf("%9[^\n]%*[^\n]", buf));
getchar();
Once you get used to this idiom, it's shorter and in some ways cleaner than:
char buf[10];
if (fgets(buf, sizeof buf, stdin) != NULL) {
char *nl;
if ((nl = strrchr(buf, '\n')) == NULL) {
int c;
while ((c = getchar()) != EOF && c != '\n') {
;
}
} else {
*nl = 0;
}
}
Almost any function that deals with NUL terminated strings is potentially unsafe.
If you are receiving data from the outside world and manipulating it via the str*() functions then you set yourself up for catastrophe
Don't forget about sprintf - it is the cause of many problems. This is true because the alternative, snprintf has sometimes different implementations which can make you code unportable.
linux: http://linux.die.net/man/3/snprintf
windows: http://msdn.microsoft.com/en-us/library/2ts7cx93%28VS.71%29.aspx
In case 1 (linux) the return value is the amount of data needed to store the entire buffer (if it is smaller than the size of the given buffer then the output was truncated)
In case 2 (windows) the return value is a negative number in case the output is truncated.
Generally you should avoid functions that are not:
buffer overflow safe (a lot of functions are already mentioned in here)
thread safe/not reentrant (strtok for example)
In the manual of each functions you should search for keywords like: safe, sync, async, thread, buffer, bugs
In all the string-copy/move scenarios - strcat(), strncat(), strcpy(), strncpy(), etc. - things go much better (safer) if a couple simple heuristics are enforced:
1. Always NUL-fill your buffer(s) before adding data.
2. Declare character-buffers as [SIZE+1], with a macro-constant.
For example, given:
#define BUFSIZE 10
char Buffer[BUFSIZE+1] = { 0x00 }; /* The compiler NUL-fills the rest */
we can use code like:
memset(Buffer,0x00,sizeof(Buffer));
strncpy(Buffer,BUFSIZE,"12345678901234567890");
relatively safely. The memset() should appear before the strncpy(), even though we initialized Buffer at compile-time, because we don't know what garbage other code placed into it before our function was called. The strncpy() will truncate the copied data to "1234567890", and will not NUL-terminate it. However, since we have already NUL-filled the entire buffer - sizeof(Buffer), rather than BUFSIZE - there is guaranteed to be a final "out-of-scope" terminating NUL anyway, as long as we constrain our writes using the BUFSIZE constant, instead of sizeof(Buffer).
Buffer and BUFSIZE likewise work fine for snprintf():
memset(Buffer,0x00,sizeof(Buffer));
if(snprintf(Buffer,BUFIZE,"Data: %s","Too much data") > BUFSIZE) {
/* Do some error-handling */
} /* If using MFC, you need if(... < 0), instead */
Even though snprintf() specifically writes only BUFIZE-1 characters, followed by NUL, this works safely. So we "waste" an extraneous NUL byte at the end of Buffer...we prevent both buffer-overflow and unterminated string conditions, for a pretty small memory-cost.
My call on strcat() and strncat() is more hard-line: don't use them. It is difficult to use strcat() safely, and the API for strncat() is so counter-intuitive that the effort needed to use it properly negates any benefit. I propose the following drop-in:
#define strncat(target,source,bufsize) snprintf(target,source,"%s%s",target,source)
It is tempting to create a strcat() drop-in, but not a good idea:
#define strcat(target,source) snprintf(target,sizeof(target),"%s%s",target,source)
because target may be a pointer (thus sizeof() does not return the information we need). I don't have a good "universal" solution to instances of strcat() in your code.
A problem I frequently encounter from "strFunc()-aware" programmers is an attempt to protect against buffer-overflows by using strlen(). This is fine if the contents are guaranteed to be NUL-terminated. Otherwise, strlen() itself can cause a buffer-overrun error (usually leading to a segmentation violation or other core-dump situation), before you ever reach the "problematic" code you are trying to protect.
atoi is not thread safe. I use strtol instead, per recommendation from the man page.

Why are strlcpy and strlcat considered insecure?

I understand that strlcpy and strlcat were designed as secure replacements for strncpy and strncat. However, some people are still of the opinion that they are insecure, and simply cause a different type of problem.
Can someone give an example of how using strlcpy or strlcat (i.e. a function that always null terminates its strings) can lead to security problems?
Ulrich Drepper and James Antill state this is true, but never provide examples or clarify this point.
Firstly, strlcpy has never been intended as a secure version of strncpy (and strncpy has never been intended as a secure version of strcpy). These two functions are totally unrelated. strncpy is a function that has no relation to C-strings (i.e. null-terminated strings) at all. The fact that it has the str... prefix in its name is just a historical blunder. The history and purpose of strncpy is well-known and well-documented. This is a function created for working with so called "fixed width" strings (not with C-strings) used in some historical versions of Unix file system. Some programmers today get confused by its name and assume that strncpy is somehow supposed to serve as limited-length C-string copying function (a "secure" sibling of strcpy), which in reality is complete nonsense and leads to bad programming practice. C standard library in its current form has no function for limited-length C-string copying whatsoever. This is where strlcpy fits in. strlcpy is indeed a true limited-length copying function created for working with C-strings. strlcpy correctly does everything a limited-length copying function should do. The only criticism one can aim at it is that it is, regretfully, not standard.
Secondly, strncat on the other hand, is indeed a function that works with C-strings and performs a limited-length concatenation (it is indeed a "secure" sibling of strcat). In order to use this function properly the programmer has to take some special care, since the size parameter this function accepts is not really the size of the buffer that receives the result, but rather the size of its remaining part (also, the terminator character is counted implicitly). This could be confusing, since in order to tie that size to the size of the buffer, programmer has to remember to perform some additional calculations, which is often used to criticize the strncat. strlcat takes care of these issues, changing the interface so that no extra calculations are necessary (at least in the calling code). Again, the only basis I see one can criticise this on is that the function is not standard. Also, functions from strcat group is something you won't see in professional code very often due to the limited usability of the very idea of rescan-based string concatenation.
As for how these functions can lead to security problems... They simply can't. They can't lead to security problems in any greater degree than the C language itself can "lead to security problems". You see, for quite a while there was a strong sentiment out there that C++ language has to move in the direction of developing into some weird flavor of Java. This sentiment sometimes spills into the domain of C language as well, resulting in rather clueless and forced criticism of C language features and the features of C standard library. I suspect that we might be dealing with something like that in this case as well, although I surely hope things are not really that bad.
Ulrich's criticism is based on the idea that a string truncation that is not detected by the program can lead to security issues, through incorrect logic. Therefore, to be secure, you need to check for truncation. To do this for a string concatenation means that you are doing a check along the lines of this:
if (destlen + sourcelen > dest_maxlen)
{
/* Bug out */
}
Now, strlcat does effectively do this check, if the programmer remembers to check the result - so you can use it safely:
if (strlcat(dest, source, dest_bufferlen) >= dest_bufferlen)
{
/* Bug out */
}
Ulrich's point is that since you have to have destlen and sourcelen around (or recalculate them, which is what strlcat effectively does), you might as well just use the more efficient memcpy anyway:
if (destlen + sourcelen > dest_maxlen)
{
goto error_out;
}
memcpy(dest + destlen, source, sourcelen + 1);
destlen += sourcelen;
(In the above code, dest_maxlen is the maximum length of the string that can be stored in dest - one less than the size of the dest buffer. dest_bufferlen is the full size of the dest buffer).
When people say, "strcpy() is dangerous, use strncpy() instead" (or similar statements about strcat() etc., but I am going to use strcpy() here as my focus), they mean that there is no bounds checking in strcpy(). Thus, an overly long string will result in buffer overruns. They are correct. Using strncpy() in this case will prevent buffer overruns.
I feel that strncpy() really doesn't fix bugs: it solves a problem that can be easily avoided by a good programmer.
As a C programmer, you must know the destination size before you are trying to copy strings. That is the assumption in strncpy() and strlcpy()'s last parameters too: you supply that size to them. You can also know the source size before you copy strings. Then, if the destination is not big enough, don't call strcpy(). Either reallocate the buffer, or do something else.
Why do I not like strncpy()?
strncpy() is a bad solution in most cases: your string is going to be truncated without any notice—I would rather write extra code to figure this out myself and then take the course of action that I want to take, rather than let some function decide for me about what to do.
strncpy() is very inefficient. It writes to every byte in the destination buffer. You don't need those thousands of '\0' at the end of your destination.
It doesn't write a terminating '\0' if the destination is not big enough. So, you must do so yourself anyway. The complexity of doing this is not worth the trouble.
Now, we come to strlcpy(). The changes from strncpy() make it better, but I am not sure if the specific behavior of strl* warrants their existence: they are far too specific. You still have to know the destination size. It is more efficient than strncpy() because it doesn't necessarily write to every byte in the destination. But it solves a problem that can be solved by doing: *((char *)mempcpy(dst, src, n)) = 0;.
I don't think anyone says that strlcpy() or strlcat() can lead to security issues, what they (and I) are saying that they can result in bugs, for example, when you expect the complete string to be written instead of a part of it.
The main issue here is: how many bytes to copy? The programmer must know this and if he doesn't, strncpy() or strlcpy() won't save him.
strlcpy() and strlcat() are not standard, neither ISO C nor POSIX. So, their use in portable programs is impossible. In fact, strlcat() has two different variants: the Solaris implementation is different from the others for edge cases involving length 0. This makes it even less useful than otherwise.
I think Ulrich and others think it'll give a false sense of security. Accidentally truncating strings can have security implications for other parts of the code (for example, if a file system path is truncated, the program might not be performing operations on the intended file).
There are two "problems" related to using strl functions:
You have to check return values
to avoid truncation.
The c1x standard draft writers and Drepper, argue that programmers won't check the return value. Drepper says we should somehow know the length and use memcpy and avoid string functions altogether, The standards committee argues that the secure strcpy should return nonzero on truncation unless otherwise stated by the _TRUNCATE flag. The idea is that people are more likely to use if(strncpy_s(...)).
Cannot be used on non-strings.
Some people think that string functions should never crash even when fed bogus data. This affects standard functions such as strlen which in normal conditions will segfault. The new standard will include many such functions. The checks of course have a performance penalty.
The upside over the proposed standard functions is that you can know how much data you missed with strl functions.
I don't think strlcpy and strlcat are consider insecure or it least it isn't the reason why they're not included in glibc - after all, glibc includes strncpy and even strcpy.
The criticism they got was that they are allegedly inefficient, not insecure.
According to the Secure Portability paper by Damien Miller:
The strlcpy and strlcat API properly check the target buffer’s bounds,
nul-terminate in all cases and return the length of the source string,
allowing detection of truncation. This API has been adopted by most
modern operating systems and many standalone software packages,
including OpenBSD (where it originated), Sun Solaris, FreeBSD, NetBSD,
the Linux kernel, rsync and the GNOME project. The notable exception
is the GNU standard C library, glibc [12], whose maintainer
steadfastly refuses to include these improved APIs, labelling them
“horribly inefficient BSD crap” [4], despite prior evidence that they
are faster is most cases than the APIs they replace [13]. As a result,
over 100 of the software packages present in the OpenBSD ports tree
maintain their own strlcpy and/or strlcat replacements or equivalent
APIs - not an ideal state of affairs.
That is why they are not available in glibc, but it is not true that they are not available on Linux. They are available on Linux in libbsd:
https://libbsd.freedesktop.org/
They're packaged in Debian and Ubuntu and other distros. You can also just grab a copy and use in your project - it's short and under a permissive license:
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/string/strlcpy.c?rev=1.11
Security is not a boolean. C functions are not wholly "secure" or "insecure", "safe" or "unsafe". When used incorrectly, a simple assignment operation in C can be "insecure". strlcpy() and strlcat() may be used safely (securely) just as strcpy() and strcat() can be used safely when the programmer provides the necessary assurances of correct usage.
The main point with all of these C string functions, standard and not-so-standard, is the level to which they make safe/secure usage easy. strcpy() and strcat() are not trivial to use safely; this is proven by the number of times that C programmers have gotten it wrong over the years and nasty vulnerabilities and exploits have ensued. strlcpy() and strlcat() and for that matter, strncpy() and strncat(), strncpy_s() and strncat_s(), are a bit easier to use safely, but still, non-trivial. Are they unsafe/insecure? No more than memcpy() is, when used incorrectly.
strlcpy may trigger SIGSEGV, if src is not NUL-terminated.
/* Not enough room in dst, add NUL and traverse rest of src */
if (n == 0) {
if (siz != 0)
*d = '\0'; /* NUL-terminate dst */
while (*s++)
;
}
return(s - src - 1); /* count does not include NUL */

Resources