Why strcpy() and strcat() is not good in Embedded Domain - c

Here i want to know about strcpy() and strcat() disadvantages
i want to know about these functions danger area in embedded domain/environment.
somebody told me we never use strcpy,strcat and strlen functions in embedded domain because its end with null and sometimes we works on encrypted data and null character comes so we cant got actual result because these functions stop on null character.
So i want to know all things and other alternative of these functions. how we can use other alternatives functions

The str* functions works with strings. If you are dealing with strings, they're fine to use as long as you use them correctly - it's easy to create a buffer overflow if you use them incorrectly.
If you are dealing with binary data, which it sounds like you are, string handling functions are unsuitable (They're meant for strings after all, not binary data). Use mem* functions for dealing with binary data.
In C , a string is a sequence of chars that end with a nul byte. If you're dealing with binary data, there might very well be a char with the value 0 in that data, which string handling functions assume to be the end of the string, or the data does not contain any nul bytes and is not nul terminated, which will cause the string functions to run past the end of your buffer.

Well, these functions indeed copy null-terminated strings and not only in embedded domain. Depending on your need you may want to use mem* functions instead.

As others have already answered, they work fine for strings. Encrypted data can't be regarded as strings.
There is however the aspect of using any C library function in embedded systems, particularly in high-integrity real-time embedded systems, such as automotive/medical/avionics etc. On such projects, a coding standard will be used, such as MISRA-C.
The vast majority of C libraries are likely not compatible with your coding standard. And even if you have the option (at least in MISRA-C) to make deviations, you would still have to verify the whole library. For example you will have to verify the whole string.h, just because you used strlen(). Common practice in such systems is to write all functions yourself, particularly simple ones like strlen() which you can write yourself in a minute.
But most embedded systems don't have such high requirements for quality and safety, and then the library functions are to prefer. Particularly memcpy() and similar search/sort/move functions, that will likely be heavily optimized by the compiler.

If you are worried about overwriting buffers (which everybody really should be), use strncpy or strncat instead. I see no problem with strlen.

This issue is specific to the system you describe, not to embedded systems per-se. Either way the string functions are simply not suited to the application you describe. I think you should simply have been told that you can't use string functions on the encrypted data in your particular application. That is not an issue with embedded systems, or even the string library. It is entirely about the nature you your encrypted strings - they are no longer C strings once encrypted, so any string library operation would no longer be valid - it becomes just data, and it would be your responsibility to retain any necessary meta-data regarding length etc. You could use Pascal style strings to do that for example (with a suitable accompanying library).
Now in general the C string library, and C-strings themselves present a number of issues for all systems, not just embedded. See this article by Joel Spolsky to see why caution should be used when using C strings functions, especially strcat().

The reason is just what you said:
because its end with null and sometimes we works on encrypted data and null character comes so we cant got actual result because these functions stop on null character.
And for alternatives, I recommend strn* series like strncpy, strnlen. n here means the maximum possible length of string.
You may want to find a C-standard library reference and seek for some details about those strn* functions.

As others have said str* functions are for strings, not binary data.
However, I suggest that when you do come to use strings, you should consider functions such as strlcpy() instead of strcpy(), and strlcat() instead of strcat().
They're not standard functions, but you'll be able to find copies of them readily enough (or really just write your own). They take the size of the destination buffer as an extra parameter to their standard cousins and are designed to avoid buffer overflows.
It probably seems like an imposition to have to pass around the size of a pointer's block wherever you use it, but I'm afraid that's what programming in C is about.
At least until we get smarter pointers that is.

Related

Unsafe C functions in HP-UX Environment

We are developing one scheduler application in the C Programming language. We are using the HP-UX environment to compile and deploy the code. During the yearly external audit of application, we received one report that contains following number of observations.
Dangerous functions: strcpy, strlen, strcat etc.
Buffer overflow: memcpy
Buffer overflow format string: sprintf, snprintf etc.
Format string: printf, sprintf etc.
They also give the general recommendation — Contains some safe functions that is:
strncpy_s
strnlen_s
strncat_s
memcpy_s etc..
Now, the problem is there no such library available for HP-UX environment. Above given functions are supported only in the Windows environment.
Is there any alternative available for dangerous functions in Linux environment?
How we can mitigate buffer-overflow format string and format string category?
See Do you use the TR 24731 'safe' functions? for a discussion of the demerits of the _s functions.
Functions such as strcpy() are safe if (and only if) you know how big the source string and the target strings are. If you don't know, you're playing with fire.
Buffer overflows with memcpy() are outright bugs in your program; you can't use it reliably if you don't know the sizes, or that the buffers do not overlap (memmove() is safer; it handles overlaps). There's an argument to say "you don't need strcpy() or strcat() etc because if you have enough data to use them safely, you can use memmove() or memcpy() instead". On the whole, strlen() is pretty safe — as long as you pass it a string. If you don't know whether you're dealing with strings, then you've got lots of problems; you must know that you're dealing with strings to call the string manipulation functions.
Note that the strncpy() and strncat() functions are not safe. The problem with strncpy() is that it does not null terminate the string if the source is too long. The problem with strncat() is that passing sizeof(dst) as the size of the destination is wrong, even if the string is empty; it has one of the weirdest, most bug-prone interfaces of any extant C function — gets() is no longer counted as extant. If you know the sizes of everything, you don't need them. If you don't know the sizes, using them won't make you safe.
Using sprintf() is unnecessarily dangerous; using snprintf() should be safe as long as you get the size correct and pay attention to data truncation by testing the return value. Check to see whether asprintf() and vasprintf() are available — and consider using them if they are.
Format string vulnerabilities arise where you have:
printf(fmtstr, value1, value2);
where the fmtstr argument can be controlled or influenced by the user. If you can determine where the format string comes from and know it is safe, then there isn't a problem, and it can help with the internationalization of your code. If you can't determine that the format string is safe, you are running risks. How serious those risks are depends on the context in which it is used. If the user root will be running the code, which seems likely for a scheduler, then you must be meticulous. You may be able to be a little more blasé if the users running the code will not be root, but it is difficult to ensure that no-one ever runs the code as root.
You're right that the _s functions are not available except on Windows. The external auditors have been downright unhelpful — suggesting the use of functions that are not available on the target platform is counter-productive. There is room to debate whether using the _s functions is sufficient, Microsoft notwithstanding. They can be misused, just as any function can. See the N1967 paper referenced in my answer to the TR 24731 question. (There are later papers available from the C standard committee's web site at http://www.open-std.org/jtc1/sc22/wg14/ which don't entirely agree with N1967 — N2336 from the Pre-London 2019 mailing, for example. I'm not sure I entirely agree with N2336.)
Consider whether strlcpy() and strlcat() are available and could/should be used for strcpy(), strncpy(), strcat(), strncat().

What is the purpose of using memory stream in the C standard library?

In the C standard library, what is the purpose of using a memory stream (as created for an array via fmemopen())? How is it compared to manipulating the array directly?
This is very similar to using the std::stringstream in C++, which allows you to write to a string (including '\0' characters) and then use the string the way you'd like.
The idea is that we have many functions at our disposal, such as fprintf(), which can be used to write data to a stream in a formatted way. All those functions can be used with a memory based file without any need for further changes anywhere else than the fopen() to fmemopen().
So if you want to create a string which requires many fprintf(), using that function to generate the string in memory is extremely useful. The snprintf() could also be used if you just need one quick conversion.
Similarly, you can of course use fread() and fwrite() and the like. If you need to create a file which requires a lot of seeking and it's not that big that it can easily fit in memory, then it's going to go a lot faster. Once done, you can save the results to disk.

String-handling practices in C [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm starting a new project in plain C (c99) that is going to work primarily with text. Because of external project constraints, this code has to be extremely simple and compact, consisting of a single source-code file without external dependencies or libraries except for libc and similar ubiquitous system libraries.
With that understanding, what are some best-practices, gotchas, tricks, or other techniques that can help make the string handling of the project more robust and secure?
Without any additional information about what your code is doing, I would recommend designing all your interfaces like this:
size_t foobar(char *dest, size_t buf_size, /* operands here */)
with semantics like snprintf:
dest points to a buffer of size at least buf_size.
If buf_size is zero, null/invalid pointers are acceptable for dest and nothing will be written.
If buf_size is non-zero, dest is always null-terminated.
Each function foobar returns the length of the full non-truncated output; the output has been truncated if buf_size is less than or equal to the return value.
This way, when the caller can easily know the destination buffer size that's required, a sufficiently large buffer can be obtained in advance. If the caller cannot easily know, it can call the function once with either a zero argument for buf_size, or with a buffer that's "probably big enough" and only retry if you ran out of space.
You can also make a wrapped version of such calls analogous to the GNU asprintf function, but if you want your code to be as flexible as possible I would avoid doing any allocation in the actual string functions. Handling the possibility of failure is always easier at the caller level, and many callers can ensure that failure is never a possibility by using a local buffer or a buffer that was obtained much earlier in the program so that the success or failure of a larger operation is atomic (which greatly simplifies error handling).
Some thoughts from a long-time embedded developer, most of which elaborate on your requirement for simplicity and are not C-specific:
Decide which string-handling functions you'll need, and keep that set as small as possible to minimize the points of failure.
Follow R.'s suggestion to define a clear interface that is consistent across all string handlers. A strict, small-but-detailed set of rules allows you to use pattern-matching as a debugging tool: you can be suspicious of any code that looks different from the rest.
As Bart van Ingen Schenau noted, track the buffer length independently of the string length. If you'll always be working with text it's safe to use the standard null character to indicate end-of-string, but it's up to you to ensure the text+null will fit in the buffer.
Ensure consistent behavior across all string handlers, particularly where the standard functions are lacking: truncation, null inputs, null-termination, padding, etc.
If you absolutely need to violate any of your rules, create a separate function for that purpose and name it appropriately. In other words, give each function a single unambiguous behavior. So you might use str_copy_and_pad() for a function that always pads its target with nulls.
Wherever possible, use safe built-in functions (e.g. memmove() per Jonathan Leffler) to do the heavy lifting. But test them to be sure they're doing what you think they're doing!
Check for errors as soon as possible. Undetected buffer overruns can lead to "ricochet" errors that are notoriously difficult to locate.
Write tests for every function to ensure it satisfies its contract. Be sure to cover the edge cases (off by 1, null/empty strings, source/destination overlap, etc.) And this may sound obvious, but be sure you understand how to create and detect a buffer underrun/overrun, then write tests that explicitly generate and check for those problems. (My QA folks are probably sick of hearing my instructions to "don't just test to make sure it works; test to make sure it doesn't break.")
Here are some techniques that have worked for me:
Create wrappers for your memory-management routines that allocate "fence bytes" on either end of your buffers during allocation and check them upon deallocation. You can also verify them within your string handlers, perhaps when a STR_DEBUG macro is set. Caveat: you'll need to test your diagnostics thoroughly, lest they create additional points of failure.
Create a data structure that encapsulates both the buffer and its length. (It can also contain the fence bytes if you use them.) Caveat: you now have a non-standard data structure that your entire code base must manage, which may mean a substantial re-write (and therefore additional points of failure).
Make your string handlers validate their inputs. If a function forbids null pointers, check for them explicitly. If it requires a valid string (like strlen() should) and you know the buffer length, check that the buffer contains a null character. In other words, verify any assumptions you might be making about the code or data.
Write your tests first. That will help you understand each function's contract--exactly what it expects from the caller, and what the caller should expect from it. You'll find yourself thinking about the ways you'll use it, the ways it might break, and about the edge cases it must handle.
Thanks so much for asking this question! I wish more developers would think about these issues--especially before they start coding. Good luck, and best wishes for a robust, successful product!
Have a look at strlcpy and strlcat , see the original paper for details.
Two cents:
Always use the "n" version of the string functions: strncpy, strncmp, (or wcsncpy, wcsncmp etc.)
Always allocate using the +1 idiom: e.g. char* str[MAX_STR_SIZE+1], and then pass MAX_STR_SIZE as the size for the "n" version of the string functions and finish with str[MAX_STR_SIZE] = '\0'; to make sure all strings are properly finalized.
The final step is important since the "n" version of the string functions won't append '\0' after copying if the maximum size was reached.
Work with arrays on the stack
whenever this is possible and initialize them properly. You don't have to keep track of allocations, sizes and initializations.
char myCopy[] = { "the interesting string" };
For medium sized strings C99 has VLA.
They are a bit less usable since you
can't initialize them. But you still have
the first two of the above
advantages.
char myBuffer[n];
myBuffer[0] = '\0';
Some important gotchas are:
In C, there is no relation at all between string length and buffer size. A string always runs up to (and including) the first '\0'-character. It is your responsibility as a programmer to make sure this character can be found within the reserved buffer for that string.
Always explicitly keep track of buffer sizes. The compiler keeps track of array sizes, but that information will be lost to you before you know it.
When it comes to time vs space, don't forget to pick the standard bit twiddling from here
During my early firmware projects, I used the look up tables to count the bit set in a O(1) operation efficiency.

Which functions from the standard library must (should) be avoided?

I've read on Stack Overflow that some C functions are "obsolete" or "should be avoided". Can you please give me some examples of this kind of function and the reason why?
What alternatives to those functions exist?
Can we use them safely - any good practices?
Deprecated Functions
Insecure
A perfect example of such a function is gets(), because there is no way to tell it how big the destination buffer is. Consequently, any program that reads input using gets() has a buffer overflow vulnerability. For similar reasons, one should use strncpy() in place of strcpy() and strncat() in place of strcat().
Yet some more examples include the tmpfile() and mktemp() function due to potential security issues with overwriting temporary files and which are superseded by the more secure mkstemp() function.
Non-Reentrant
Other examples include gethostbyaddr() and gethostbyname() which are non-reentrant (and, therefore, not guaranteed to be threadsafe) and have been superseded by the reentrant getaddrinfo() and freeaddrinfo().
You may be noticing a pattern here... either lack of security (possibly by failing to include enough information in the signature to possibly implement it securely) or non-reentrance are common sources of deprecation.
Outdated, Non-Portable
Some other functions simply become deprecated because they duplicate functionality and are not as portable as other variants. For example, bzero() is deprecated in favor of memset().
Thread Safety and Reentrance
You asked, in your post, about thread safety and reentrance. There is a slight difference. A function is reentrant if it does not use any shared, mutable state. So, for example, if all the information it needs is passed into the function, and any buffers needed are also passed into the function (rather than shared by all calls to the function), then it is reentrant. That means that different threads, by using independent parameters, do not risk accidentally sharing state. Reentrancy is a stronger guarantee than thread safety. A function is thread safe if it can be used by multiple threads concurrently. A function is thread safe if:
It is reentrant (i.e. it does not share any state between calls), or:
It is non-reentrant, but it uses synchronization/locking as needed for shared state.
In general, in the Single UNIX Specification and IEEE 1003.1 (i.e. "POSIX"), any function which is not guaranteed to be reentrant is not guaranteed to be thread safe. So, in other words, only functions which are guaranteed to be reentrant may be portably used in multithreaded applications (without external locking). That does not mean, however, that implementations of these standards cannot choose to make a non-reentrant function threadsafe. For example, Linux frequently adds synchronization to non-reentrant functions in order to add a guarantee (beyond that of the Single UNIX Specification) of threadsafety.
Strings (and Memory Buffers, in General)
You also asked if there is some fundamental flaw with strings/arrays. Some might argue that this is the case, but I would argue that no, there is no fundamental flaw in the language. C and C++ require you to pass the length/capacity of an array separately (it is not a ".length" property as in some other languages). This is not a flaw, per-se. Any C and C++ developer can write correct code simply by passing the length as a parameter where needed. The problem is that several APIs that required this information failed to specify it as a parameter. Or assumed that some MAX_BUFFER_SIZE constant would be used. Such APIs have now been deprecated and replaced by alternative APIs that allow the array/buffer/string sizes to be specified.
Scanf (In Answer to Your Last Question)
Personally, I use the C++ iostreams library (std::cin, std::cout, the << and >> operators, std::getline, std::istringstream, std::ostringstream, etc.), so I do not typically deal with that. If I were forced to use pure C, though, I would personally just use fgetc() or getchar() in combination with strtol(), strtoul(), etc. and parse things manually, since I'm not a huge fan of varargs or format strings. That said, to the best of my knowledge, there is no problem with [f]scanf(), [f]printf(), etc. so long as you craft the format strings yourself, you never pass arbitrary format strings or allow user input to be used as format strings, and you use the formatting macros defined in <inttypes.h> where appropriate. (Note, snprintf() should be used in place of sprintf(), but that has to do with failing to specify the size of the destination buffer and not the use of format strings). I should also point out that, in C++, boost::format provides printf-like formatting without varargs.
Once again people are repeating, mantra-like, the ludicrous assertion that the "n" version of str functions are safe versions.
If that was what they were intended for then they would always null terminate the strings.
The "n" versions of the functions were written for use with fixed length fields (such as directory entries in early file systems) where the nul terminator is only required if the string does not fill the field. This is also the reason why the functions have strange side effects that are pointlessly inefficient if just used as replacements - take strncpy() for example:
If the array pointed to by s2 is a
string that is shorter than n bytes,
null bytes are appended to the copy in
the array pointed to by s1, until n
bytes in all are written.
As buffers allocated to handle filenames are typically 4kbytes this can lead to a massive deterioration in performance.
If you want "supposedly" safe versions then obtain - or write your own - strl routines (strlcpy, strlcat etc) which always nul terminate the strings and don't have side effects. Please note though that these aren't really safe as they can silently truncate the string - this is rarely the best course of action in any real-world program. There are occasions where this is OK but there are also many circumstances where it could lead to catastrophic results (e.g. printing out medical prescriptions).
Several answers here suggest using strncat() over strcat(); I'd suggest that strncat() (and strncpy()) should also be avoided. It has problems that make it difficult to use correctly and lead to bugs:
the length parameter to strncat() is related to (but not quite exactly - see the 3rd point) the maximum number of characters that can be copied to the destination rather than the size of the destination buffer. This makes strncat() more difficult to use than it should be, particularly if multiple items will be concatenated to the destination.
it can be difficult to determine if the result was truncated (which may or may not be important)
it's easy to have an off-by-one error. As the C99 standard notes, "Thus, the maximum number of characters that can end up in the array pointed to by s1 is strlen(s1)+n+1" for a call that looks like strncat( s1, s2, n)
strncpy() also has an issue that can result in bugs you try to use it in an intuitive way - it doesn't guarantee that the destination is null terminated. To ensure that you have to make sure you specifically handle that corner case by dropping a '\0' in the buffer's last location yourself (at least in certain situations).
I'd suggest using something like OpenBSD's strlcat() and strlcpy() (though I know that some people dislike those functions; I believe they're far easier to use safely than strncat()/strncpy()).
Here's a little of what Todd Miller and Theo de Raadt had to say about problems with strncat() and strncpy():
There are several problems encountered when strncpy() and strncat() are used as safe versions of strcpy() and strcat(). Both functions deal with NUL-termination and the length parameter in different and non-intuitive ways that confuse even experienced programmers. They also provide no easy way to detect when truncation occurs. ... Of all these issues, the confusion caused by the length parameters and the related issue of NUL-termination are most important. When we audited the OpenBSD source tree for potential security holes we found rampant misuse of strncpy() and strncat(). While not all of these resulted in exploitable security holes, they made it clear that the rules for using strncpy() and strncat() in safe string operations are widely misunderstood.
OpenBSD's security audit found that bugs with these functions were "rampant". Unlike gets(), these functions can be used safely, but in practice there are a lot of problems because the interface is confusing, unintuitive and difficult to use correctly. I know that Microsoft has also done analysis (though I don't know how much of their data they may have published), and as a result have banned (or at least very strongly discouraged - the 'ban' might not be absolute) the use of strncat() and strncpy() (among other functions).
Some links with more information:
http://www.usenix.org/events/usenix99/full_papers/millert/millert_html/
http://en.wikipedia.org/wiki/Off-by-one_error#Security_implications
http://blogs.msdn.com/michael_howard/archive/2004/10/29/249713.aspx
http://blogs.msdn.com/michael_howard/archive/2004/11/02/251296.aspx
http://blogs.msdn.com/michael_howard/archive/2004/12/10/279639.aspx
http://blogs.msdn.com/michael_howard/archive/2006/10/30/something-else-to-look-out-for-when-reviewing-code.aspx
Standard library functions that should never be used:
setjmp.h
setjmp(). Together with longjmp(), these functions are widely recogniced as incredibly dangerous to use: they lead to spaghetti programming, they come with numerous forms of undefined behavior, they can cause unintended side-effects in the program environment, such as affecting values stored on the stack. References: MISRA-C:2012 rule 21.4, CERT C MSC22-C.
longjmp(). See setjmp().
stdio.h
gets(). The function has been removed from the C language (as per C11), as it was unsafe as per design. The function was already flagged as obsolete in C99. Use fgets() instead. References: ISO 9899:2011 K.3.5.4.1, also see note 404.
stdlib.h
atoi() family of functions. These have no error handling but invoke undefined behavior whenever errors occur. Completely superfluous functions that can be replaced with the strtol() family of functions. References: MISRA-C:2012 rule 21.7.
string.h
strncat(). Has an awkward interface that are often misused. It is mostly a superfluous function. Also see remarks for strncpy().
strncpy(). The intention of this function was never to be a safer version of strcpy(). Its sole purpose was always to handle an ancient string format on Unix systems, and that it got included in the standard library is a known mistake. This function is dangerous because it may leave the string without null termination and programmers are known to often use it incorrectly. References: Why are strlcpy and strlcat considered insecure?, with a more detailed explanation here: Is strcpy dangerous and what should be used instead?.
Standard library functions that should be used with caution:
assert.h
assert(). Comes with overhead and should generally not be used in production code. It is better to use an application-specific error handler which displays errors but does not necessarily close down the whole program.
signal.h
signal(). References: MISRA-C:2012 rule 21.5, CERT C SIG32-C.
stdarg.h
va_arg() family of functions. The presence of variable-length functions in a C program is almost always an indication of poor program design. Should be avoided unless you have very specific requirements.
stdio.h
Generally, this whole library is not recommended for production code, as it comes with numerous cases of poorly-defined behavior and poor type safety.
fflush(). Perfectly fine to use for output streams. Invokes undefined behavior if used for input streams.
gets_s(). Safe version of gets() included in C11 bounds-checking interface. It is preferred to use fgets() instead, as per C standard recommendation. References: ISO 9899:2011 K.3.5.4.1.
printf() family of functions. Resource heavy functions that come with lots of undefined behavior and poor type safety. sprintf() also has vulnerabilities. These functions should be avoided in production code. References: MISRA-C:2012 rule 21.6.
scanf() family of functions. See remarks about printf(). Also, - scanf() is vulnerable to buffer overruns if not used correctly. fgets() is preferred to use when possible. References: CERT C INT05-C, MISRA-C:2012 rule 21.6.
tmpfile() family of functions. Comes with various vulnerability issues. References: CERT C FIO21-C.
stdlib.h
malloc() family of functions. Perfectly fine to use in hosted systems, though be aware of well-known issues in C90 and therefore don't cast the result. The malloc() family of functions should never be used in freestanding applications. References: MISRA-C:2012 rule 21.3.
Also note that realloc() is dangerous in case you overwrite the old pointer with the result of realloc(). In case the function fails, you create a leak.
system(). Comes with lots of overhead and although portable, it is often better to use system-specific API functions instead. Comes with various poorly-defined behavior. References: CERT C ENV33-C.
string.h
strcat(). See remarks for strcpy().
strcpy(). Perfectly fine to use, unless the size of the data to be copied is unknown or larger than the destination buffer. If no check of the incoming data size is done, there may be buffer overruns. Which is no fault of strcpy() itself, but of the calling application - that strcpy() is unsafe is mostly a myth created by Microsoft.
strtok(). Alters the caller string and uses internal state variables, which could make it unsafe in a multi-threaded environment.
Some people would claim that strcpy and strcat should be avoided, in favor of strncpy and strncat. This is somewhat subjective, in my opinion.
They should definitely be avoided when dealing with user input - no doubt here.
In code "far" from the user, when you just know the buffers are long enough, strcpy and strcat may be a bit more efficient because computing the n to pass to their cousins may be superfluous.
Avoid
strtok for multithreaded programs as its not thread-safe.
gets as it could cause buffer overflow
It is probably worth adding again that strncpy() is not the general-purpose replacement for strcpy() that it's name might suggest. It is designed for fixed-length fields that don't need a nul-terminator (it was originally designed for use with UNIX directory entries, but can be useful for things like encryption key fields).
It is easy, however, to use strncat() as a replacement for strcpy():
if (dest_size > 0)
{
dest[0] = '\0';
strncat(dest, source, dest_size - 1);
}
(The if test can obviously be dropped in the common case, where you know that dest_size is definitely nonzero).
Also check out Microsoft's list of banned APIs. These are APIs (including many already listed here) that are banned from Microsoft code because they are often misused and lead to security problems.
You may not agree with all of them, but they are all worth considering. They add an API to the list when its misuse has led to a number of security bugs.
It is very hard to use scanf safely. Good use of scanf can avoid buffer overflows, but you are still vulnerable to undefined behavior when reading numbers that don't fit in the requested type. In most cases, fgets followed by self-parsing (using sscanf, strchr, etc.) is a better option.
But I wouldn't say "avoid scanf all the time". scanf has its uses. As an example, let's say you want to read user input in a char array that's 10 bytes long. You want to remove the trailing newline, if any. If the user enters more than 9 characters before a newline, you want to store the first 9 characters in the buffer and discard everything until the next newline. You can do:
char buf[10];
scanf("%9[^\n]%*[^\n]", buf));
getchar();
Once you get used to this idiom, it's shorter and in some ways cleaner than:
char buf[10];
if (fgets(buf, sizeof buf, stdin) != NULL) {
char *nl;
if ((nl = strrchr(buf, '\n')) == NULL) {
int c;
while ((c = getchar()) != EOF && c != '\n') {
;
}
} else {
*nl = 0;
}
}
Almost any function that deals with NUL terminated strings is potentially unsafe.
If you are receiving data from the outside world and manipulating it via the str*() functions then you set yourself up for catastrophe
Don't forget about sprintf - it is the cause of many problems. This is true because the alternative, snprintf has sometimes different implementations which can make you code unportable.
linux: http://linux.die.net/man/3/snprintf
windows: http://msdn.microsoft.com/en-us/library/2ts7cx93%28VS.71%29.aspx
In case 1 (linux) the return value is the amount of data needed to store the entire buffer (if it is smaller than the size of the given buffer then the output was truncated)
In case 2 (windows) the return value is a negative number in case the output is truncated.
Generally you should avoid functions that are not:
buffer overflow safe (a lot of functions are already mentioned in here)
thread safe/not reentrant (strtok for example)
In the manual of each functions you should search for keywords like: safe, sync, async, thread, buffer, bugs
In all the string-copy/move scenarios - strcat(), strncat(), strcpy(), strncpy(), etc. - things go much better (safer) if a couple simple heuristics are enforced:
1. Always NUL-fill your buffer(s) before adding data.
2. Declare character-buffers as [SIZE+1], with a macro-constant.
For example, given:
#define BUFSIZE 10
char Buffer[BUFSIZE+1] = { 0x00 }; /* The compiler NUL-fills the rest */
we can use code like:
memset(Buffer,0x00,sizeof(Buffer));
strncpy(Buffer,BUFSIZE,"12345678901234567890");
relatively safely. The memset() should appear before the strncpy(), even though we initialized Buffer at compile-time, because we don't know what garbage other code placed into it before our function was called. The strncpy() will truncate the copied data to "1234567890", and will not NUL-terminate it. However, since we have already NUL-filled the entire buffer - sizeof(Buffer), rather than BUFSIZE - there is guaranteed to be a final "out-of-scope" terminating NUL anyway, as long as we constrain our writes using the BUFSIZE constant, instead of sizeof(Buffer).
Buffer and BUFSIZE likewise work fine for snprintf():
memset(Buffer,0x00,sizeof(Buffer));
if(snprintf(Buffer,BUFIZE,"Data: %s","Too much data") > BUFSIZE) {
/* Do some error-handling */
} /* If using MFC, you need if(... < 0), instead */
Even though snprintf() specifically writes only BUFIZE-1 characters, followed by NUL, this works safely. So we "waste" an extraneous NUL byte at the end of Buffer...we prevent both buffer-overflow and unterminated string conditions, for a pretty small memory-cost.
My call on strcat() and strncat() is more hard-line: don't use them. It is difficult to use strcat() safely, and the API for strncat() is so counter-intuitive that the effort needed to use it properly negates any benefit. I propose the following drop-in:
#define strncat(target,source,bufsize) snprintf(target,source,"%s%s",target,source)
It is tempting to create a strcat() drop-in, but not a good idea:
#define strcat(target,source) snprintf(target,sizeof(target),"%s%s",target,source)
because target may be a pointer (thus sizeof() does not return the information we need). I don't have a good "universal" solution to instances of strcat() in your code.
A problem I frequently encounter from "strFunc()-aware" programmers is an attempt to protect against buffer-overflows by using strlen(). This is fine if the contents are guaranteed to be NUL-terminated. Otherwise, strlen() itself can cause a buffer-overrun error (usually leading to a segmentation violation or other core-dump situation), before you ever reach the "problematic" code you are trying to protect.
atoi is not thread safe. I use strtol instead, per recommendation from the man page.

Why are strlcpy and strlcat considered insecure?

I understand that strlcpy and strlcat were designed as secure replacements for strncpy and strncat. However, some people are still of the opinion that they are insecure, and simply cause a different type of problem.
Can someone give an example of how using strlcpy or strlcat (i.e. a function that always null terminates its strings) can lead to security problems?
Ulrich Drepper and James Antill state this is true, but never provide examples or clarify this point.
Firstly, strlcpy has never been intended as a secure version of strncpy (and strncpy has never been intended as a secure version of strcpy). These two functions are totally unrelated. strncpy is a function that has no relation to C-strings (i.e. null-terminated strings) at all. The fact that it has the str... prefix in its name is just a historical blunder. The history and purpose of strncpy is well-known and well-documented. This is a function created for working with so called "fixed width" strings (not with C-strings) used in some historical versions of Unix file system. Some programmers today get confused by its name and assume that strncpy is somehow supposed to serve as limited-length C-string copying function (a "secure" sibling of strcpy), which in reality is complete nonsense and leads to bad programming practice. C standard library in its current form has no function for limited-length C-string copying whatsoever. This is where strlcpy fits in. strlcpy is indeed a true limited-length copying function created for working with C-strings. strlcpy correctly does everything a limited-length copying function should do. The only criticism one can aim at it is that it is, regretfully, not standard.
Secondly, strncat on the other hand, is indeed a function that works with C-strings and performs a limited-length concatenation (it is indeed a "secure" sibling of strcat). In order to use this function properly the programmer has to take some special care, since the size parameter this function accepts is not really the size of the buffer that receives the result, but rather the size of its remaining part (also, the terminator character is counted implicitly). This could be confusing, since in order to tie that size to the size of the buffer, programmer has to remember to perform some additional calculations, which is often used to criticize the strncat. strlcat takes care of these issues, changing the interface so that no extra calculations are necessary (at least in the calling code). Again, the only basis I see one can criticise this on is that the function is not standard. Also, functions from strcat group is something you won't see in professional code very often due to the limited usability of the very idea of rescan-based string concatenation.
As for how these functions can lead to security problems... They simply can't. They can't lead to security problems in any greater degree than the C language itself can "lead to security problems". You see, for quite a while there was a strong sentiment out there that C++ language has to move in the direction of developing into some weird flavor of Java. This sentiment sometimes spills into the domain of C language as well, resulting in rather clueless and forced criticism of C language features and the features of C standard library. I suspect that we might be dealing with something like that in this case as well, although I surely hope things are not really that bad.
Ulrich's criticism is based on the idea that a string truncation that is not detected by the program can lead to security issues, through incorrect logic. Therefore, to be secure, you need to check for truncation. To do this for a string concatenation means that you are doing a check along the lines of this:
if (destlen + sourcelen > dest_maxlen)
{
/* Bug out */
}
Now, strlcat does effectively do this check, if the programmer remembers to check the result - so you can use it safely:
if (strlcat(dest, source, dest_bufferlen) >= dest_bufferlen)
{
/* Bug out */
}
Ulrich's point is that since you have to have destlen and sourcelen around (or recalculate them, which is what strlcat effectively does), you might as well just use the more efficient memcpy anyway:
if (destlen + sourcelen > dest_maxlen)
{
goto error_out;
}
memcpy(dest + destlen, source, sourcelen + 1);
destlen += sourcelen;
(In the above code, dest_maxlen is the maximum length of the string that can be stored in dest - one less than the size of the dest buffer. dest_bufferlen is the full size of the dest buffer).
When people say, "strcpy() is dangerous, use strncpy() instead" (or similar statements about strcat() etc., but I am going to use strcpy() here as my focus), they mean that there is no bounds checking in strcpy(). Thus, an overly long string will result in buffer overruns. They are correct. Using strncpy() in this case will prevent buffer overruns.
I feel that strncpy() really doesn't fix bugs: it solves a problem that can be easily avoided by a good programmer.
As a C programmer, you must know the destination size before you are trying to copy strings. That is the assumption in strncpy() and strlcpy()'s last parameters too: you supply that size to them. You can also know the source size before you copy strings. Then, if the destination is not big enough, don't call strcpy(). Either reallocate the buffer, or do something else.
Why do I not like strncpy()?
strncpy() is a bad solution in most cases: your string is going to be truncated without any notice—I would rather write extra code to figure this out myself and then take the course of action that I want to take, rather than let some function decide for me about what to do.
strncpy() is very inefficient. It writes to every byte in the destination buffer. You don't need those thousands of '\0' at the end of your destination.
It doesn't write a terminating '\0' if the destination is not big enough. So, you must do so yourself anyway. The complexity of doing this is not worth the trouble.
Now, we come to strlcpy(). The changes from strncpy() make it better, but I am not sure if the specific behavior of strl* warrants their existence: they are far too specific. You still have to know the destination size. It is more efficient than strncpy() because it doesn't necessarily write to every byte in the destination. But it solves a problem that can be solved by doing: *((char *)mempcpy(dst, src, n)) = 0;.
I don't think anyone says that strlcpy() or strlcat() can lead to security issues, what they (and I) are saying that they can result in bugs, for example, when you expect the complete string to be written instead of a part of it.
The main issue here is: how many bytes to copy? The programmer must know this and if he doesn't, strncpy() or strlcpy() won't save him.
strlcpy() and strlcat() are not standard, neither ISO C nor POSIX. So, their use in portable programs is impossible. In fact, strlcat() has two different variants: the Solaris implementation is different from the others for edge cases involving length 0. This makes it even less useful than otherwise.
I think Ulrich and others think it'll give a false sense of security. Accidentally truncating strings can have security implications for other parts of the code (for example, if a file system path is truncated, the program might not be performing operations on the intended file).
There are two "problems" related to using strl functions:
You have to check return values
to avoid truncation.
The c1x standard draft writers and Drepper, argue that programmers won't check the return value. Drepper says we should somehow know the length and use memcpy and avoid string functions altogether, The standards committee argues that the secure strcpy should return nonzero on truncation unless otherwise stated by the _TRUNCATE flag. The idea is that people are more likely to use if(strncpy_s(...)).
Cannot be used on non-strings.
Some people think that string functions should never crash even when fed bogus data. This affects standard functions such as strlen which in normal conditions will segfault. The new standard will include many such functions. The checks of course have a performance penalty.
The upside over the proposed standard functions is that you can know how much data you missed with strl functions.
I don't think strlcpy and strlcat are consider insecure or it least it isn't the reason why they're not included in glibc - after all, glibc includes strncpy and even strcpy.
The criticism they got was that they are allegedly inefficient, not insecure.
According to the Secure Portability paper by Damien Miller:
The strlcpy and strlcat API properly check the target buffer’s bounds,
nul-terminate in all cases and return the length of the source string,
allowing detection of truncation. This API has been adopted by most
modern operating systems and many standalone software packages,
including OpenBSD (where it originated), Sun Solaris, FreeBSD, NetBSD,
the Linux kernel, rsync and the GNOME project. The notable exception
is the GNU standard C library, glibc [12], whose maintainer
steadfastly refuses to include these improved APIs, labelling them
“horribly inefficient BSD crap” [4], despite prior evidence that they
are faster is most cases than the APIs they replace [13]. As a result,
over 100 of the software packages present in the OpenBSD ports tree
maintain their own strlcpy and/or strlcat replacements or equivalent
APIs - not an ideal state of affairs.
That is why they are not available in glibc, but it is not true that they are not available on Linux. They are available on Linux in libbsd:
https://libbsd.freedesktop.org/
They're packaged in Debian and Ubuntu and other distros. You can also just grab a copy and use in your project - it's short and under a permissive license:
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/string/strlcpy.c?rev=1.11
Security is not a boolean. C functions are not wholly "secure" or "insecure", "safe" or "unsafe". When used incorrectly, a simple assignment operation in C can be "insecure". strlcpy() and strlcat() may be used safely (securely) just as strcpy() and strcat() can be used safely when the programmer provides the necessary assurances of correct usage.
The main point with all of these C string functions, standard and not-so-standard, is the level to which they make safe/secure usage easy. strcpy() and strcat() are not trivial to use safely; this is proven by the number of times that C programmers have gotten it wrong over the years and nasty vulnerabilities and exploits have ensued. strlcpy() and strlcat() and for that matter, strncpy() and strncat(), strncpy_s() and strncat_s(), are a bit easier to use safely, but still, non-trivial. Are they unsafe/insecure? No more than memcpy() is, when used incorrectly.
strlcpy may trigger SIGSEGV, if src is not NUL-terminated.
/* Not enough room in dst, add NUL and traverse rest of src */
if (n == 0) {
if (siz != 0)
*d = '\0'; /* NUL-terminate dst */
while (*s++)
;
}
return(s - src - 1); /* count does not include NUL */

Resources