I was looking through the manuals on strcpy() and strcat(). Seems there's no way to evaluate the "success" of the function call. (ie return value will never be NULL), is that correct?
It's just assumed that if you follow the rules for the input of these functions that the output will be valid? Just wanted to make sure I wasn’t missing anything here…
These functions cannot fail in any well-defined way. They will either succeed, or things have gone horribly wrong (e.g. missing 0 char or too small output buffer), and anything could happen.
Because I don't think functions like that really have any way of knowing what "success" is. For instance, strcpy is really just memcpy. So as far as C is concerned, it is just taking data from one memory location and copying it to another. It doesn't know how the data is supposed to look, or be formatted in ways you expect. I guess the only real way you'd know if there is a success or not is if you end up getting a segfault or not.
These functions are guaranteed to work, provided that you're not invoking undefined behaviour. In particular, the memory that you're writing to needs to be allocated. There is really no way for them to fail except crashing your program.
Generally, because it can be hard to tell how many bytes will be written, use of these functions is discouraged. Use strncpy and strncat if you can.
Related
I've seen some posters stating that strdup is evil. Is there a consensus on this? I've used it without any guilty feelings and can see no reason why it is worse than using malloc/memcpy.
The only thing I can think might earn strdup a reputation is that callers might misuse it (eg. not realise they have to free the memory returned; try to strcat to the end of a strdup'ed string). But then malloc'ed strings are not free from the possibility of misuse either.
Thanks for the replies and apologies to those who consider the question unhelpful (votes to close). In summary of the replies, it seems that there is no general feeling that strdup is evil per se, but a general consensus that it can, like many other parts of C, be used improperly or unsafely.
There is no 'correct' answer really, but for the sake of accepting one, I accepted #nneoneo's answer - it could equally have been #R..'s answer.
Two reasons I can think of:
It's not strictly ANSI C, but rather POSIX. Consequently, some compilers (e.g. MSVC) discourage use (MSVC prefers _strdup), and technically the C standard could define its own strdup with different semantics since str is a reserved prefix. So, there are some potential portability concerns with its use.
It hides its memory allocation. Most other str functions don't allocate memory, so users might be misled (as you say) into believing the returned string doesn't need to be freed.
But, aside from these points, I think that careful use of strdup is justified, as it can reduce code duplication and provides a nice implementation for common idioms (such as strdup("constant string") to get a mutable, returnable copy of a literal string).
My answer is rather supporting strdup and it is no worse than any other function in C.
POSIX is a standard and strdup is not too difficult to implement if portability becomes an issue.
Whether to free the memory allocated by strdup shouldn't be an issue if anyone taken a little time to read the man page and understand how strdup works. If one doesn't understand how a function works, it's very likely the person is going to mess up something, this is applicable to any function, not just strdup.
In C, memory & most other things are managed by the programmer, so strdup is no worse than forgetting to free malloc'ed memory, failing to null terminate a string, using incorrect format string in scanf (and invoking undefined behaviour), accessing dangling pointer etc.
(I really wanted to post this as a comment, but couldn't add in a single comment. Hence, posted it as an answer).
I haven't really heard strdup described as evil, but some possible reasons some people dislike it:
It's not standard C (but is in POSIX). However I find this reason silly because it's nearly a one-line function to add on systems that lack it.
Blindly duplicating strings all over the place rather than using them in-place when possible wastes time and memory and introduces failure cases into code that might otherwise be failure-free.
When you do need a copy of a string, it's likely you actually need more space to modify or build on it, and strdup does not give you that.
I think the majority of the concern about strdup comes from security concerns regarding buffer over runs, and improperly formatted strings. If a non-null terminated string is passed to strdup it can allocated an undefined length string. I don't know if this can be specifically leveraged into an attack but in general it is good secure coding practice to only use string functions which take a maximum length instead of relying on the null character alone.
Many people obviously don't, but I personally find strdup evil for several reasons,
the main one being it hides the allocation. The other str* functions and most other standard functions require no free afterwards, so strdup looks innocuous enough and you can forget to clean up after it. dmckee suggested to just add it to your mental list of functions that need cleaning up after, but why? I don't see a big advantage over reducing two medium-length lines to one short one.
It allocates memory on the heap always, and with C99's (is it 99?) VLAs, you have yet another reason to just use strcpy (you don't even need malloc). You can't always do this, but when you can, you should.
It's not part of the ISO standard (but it is part of the POSIX standard, thanks Wiz), but that's really a small point as R.. mentioned that it can be added easily. If you write portable programs, I'm not sure how you'd tell if it was already defined or not though...
These are of course a few of my own reasons, no one else's. To answer your question, there is no consensus that I'm aware of.
If you're writing programs just for yourself and you find strdup no problem, then there's much less reason not to use it than if you are writing a program to be read by many people of many skill levels and ages.
My reason for disliking strdup, which hasn't been mentioned, is that it is resource allocation without a natural pair. Let's try a silly game: I say malloc, you say free. I say open you say close. I say create you say destroy. I say strdup you say ....?
Actually, the answer to strdup is free of course, and the function would have been better named malloc_and_strcpy to make that clear. But many C programmers don't think of it that way and forgets that strdup requires its opposite or "ending" free to deallocate.
In my experience, it is very common to find memory leaks in code which calls strdup. It's an odd function which combines strlen, malloc and strcpy.
Why is strdup considered to be evil
Conflicts with Future language directions.
Reliance on errno state.
Easier to make your own strdup() that is not quite like the POISX one nor the future C2x one.
With C2x on the way with certain inclusion of strdup(), using strdup() before that has these problems.
The C2x proposed strdup() does not mention errno whereas POSIX does. Code that relies on setting errno to ENOMEM or EINVAL can have trouble in the future.
The C2x proposed char *strdup(const char *s1) uses a const char * as the parameter. User coded versions of strdup() too often use char *s1, incurring a difference that can break select code that counts on the char * signature. I.E. function pointers.
User code that did roll their own strdup() were not following C's Future language directions with its "Function names that begin with str, mem, or wcs and a lowercase letter may be added to the
declarations in the <string.h> header" and so may incur library conflicts with the new strdup() and user's strdup().
If user code wants strdup() code before C2x, consider naming it something different like my_strdup() and use a const char * parameter. Minimize or avoid any reliance on the state of errno after the call returns NULL.
My my_strdup() effort - warts and all.
I need a cross platform way of treating memory buffer as FILE*. I have seen other questions which point out that there is no portable way to do this (fmemopen in linux is what I need but it fails on Windows platform).
I have tried using the setvbuf and it seems to work. Can anyone please point out the exact problem of using setvbuf function?
Also , I have seen the C standard draft WG14/N1256 and 7.19.5.6 says:
the contents of array at any time are indeterminate.
I don't understand if I use my own buffer how can its contents be indeterminate?
EDIT: Thanks for all the answers. Not using this method anymore.
No really, there's no portable way to do this.
Using setvbuf may appear to work but you're really invoking undefined behavior, and it will fail in unexpected ways at unexpected times. The GNU C library does have fmemopen(3) as an extension, as you mentioned, but it's not portable to non-GNU systems.
If you're using some library that requires a FILE* pointer and you only have the required data in memory, you'll just have to write it out to a temporary file and pass in a handle to that file. Ideally, your library should provide an alternative function that takes a memory pointer instead of a file pointer, but if not, you're out of luck (and you should complain to the library writer about that deficiency).
Function setvbuf() is used to tell the FILE the memory to be used as buffer, but it does not specify how this memory will be used: that's up to the implementation.
Thus, the contents of the buffer are indeterminate at any time, and if it happens to work for you, it is just by chance.
It depends on what you want to do with the buffer/FILE*. You can certainly perform simple operations and get away with them, but you cannot guarantee that all of the FILE* operations will perform as expected on your memory buffer.
Sorry, there is simply no cross-platform one-liner to get full FILE* characteristics, I've tried myself many times haha
what you can try:
#define-wrapped OS-specific logic
Look further into the interface you are trying to interact with. At some point it just plays with a buffer anyway. Then splice in your buffer. This is what I did.
Your technique + faith.
Why is strlen() not checking for NULL?
if I do strlen(NULL), the program segmentation faults.
Trying to understand the rationale behind it (if any).
The rational behind it is simple -- how can you check the length of something that does not exist?
Also, unlike "managed languages" there is no expectations the run time system will handle invalid data or data structures correctly. (This type of issue is exactly why more "modern" languages are more popular for non-computation or less performant requiring applications).
A standard template in c would look like this
int someStrLen;
if (someStr != NULL) // or if (someStr)
someStrLen = strlen(someStr);
else
{
// handle error.
}
The portion of the language standard that defines the string handling library states that, unless specified otherwise for the specific function, any pointer arguments must have valid values.
The philosphy behind the design of the C standard library is that the programmer is ultimately in the best position to know whether a run-time check really needs to be performed. Back in the days when your total system memory was measured in kilobytes, the overhead of performing an unnecessary runtime check could be pretty painful. So the C standard library doesn't bother doing any of those checks; it assumes that the programmer has already done it if it's really necessary. If you know you will never pass a bad pointer value to strlen (such as, you're passing in a string literal, or a locally allocated array), then there's no need to clutter up the resulting binary with an unnecessary check against NULL.
The standard does not require it, so implementations just avoid a test and potentially an expensive jump.
A little macro to help your grief:
#define strlens(s) (s==NULL?0:strlen(s))
Three significant reasons:
The standard library and the C language are designed assuming that the programmer knows what he is doing, so a null pointer isn't treated as an edge case, but rather as a programmer's mistake that results in undefined behaviour;
It incurs runtime overhead - calling strlen thousands of times and always doing str != NULL is not reasonable unless the programmer is treated as a sissy;
It adds up to the code size - it could only be a few instructions, but if you adopt this principle and do it everywhere it can inflate your code significantly.
size_t strlen ( const char * str );
http://www.cplusplus.com/reference/clibrary/cstring/strlen/
Strlen takes a pointer to a character array as a parameter, null is not a valid argument to this function.
Are functions like strcpy, gets, etc. always dangerous? What if I write a code like this:
int main(void)
{
char *str1 = "abcdefghijklmnop";
char *str2 = malloc(100);
strcpy(str2, str1);
}
This way the function doesn't accept arguments(parameters...) and the str variable will always be the same length...which is here 16 or slightly more depending on the compiler version...but yeah 100 will suffice as of march, 2011 :).
Is there a way for a hacker to take advantage of the code above?
10x!
Absolutely not. Contrary to Microsoft's marketing campaign for their non-standard functions, strcpy is safe when used properly.
The above is redundant, but mostly safe. The only potential issue is that you're not checking the malloc return value, so you may be dereferencing null (as pointed out by kotlinski). In practice, this likely to cause an immediate SIGSEGV and program termination.
An improper and dangerous use would be:
char array[100];
// ... Read line into uncheckedInput
// Extract substring without checking length
strcpy(array, uncheckedInput + 10);
This is unsafe because the strcpy may overflow, causing undefined behavior. In practice, this is likely to overwrite other local variables (itself a major security breach). One of these may be the return address. Through a return to lib C attack, the attacker may be able to use C functions like system to execute arbitrary programs. There are other possible consequences to overflows.
However, gets is indeed inherently unsafe, and will be removed from the next version of C (C1X). There is simply no way to ensure the input won't overflow (causing the same consequences given above). Some people would argue it's safe when used with a known input file, but there's really no reason to ever use it. POSIX's getline is a far better alternative.
Also, the length of str1 doesn't vary by compiler. It should always be 17, including the terminating NUL.
You are forcefully stuffing completely different things into one category.
Functions gets is indeed always dangerous. There's no way to make a safe call to gets regardless of what steps you are willing to take and how defensive you are willing to get.
Function strcpy is perfectly safe if you are willing to take the [simple] necessary steps to make sure that your calls to strcpy are safe.
That already puts gets and strcpy in vastly different categories, which have nothing in common with regard to safety.
The popular criticisms directed at safety aspects of strcpy are based entirely on anecdotal social observations as opposed to formal facts, e.g. "programmers are lazy and incompetent, so don't let them use strcpy". Taken in the context of C programming, this is, of course, utter nonsense. Following this logic we should also declare the division operator exactly as unsafe for exactly the same reasons.
In reality, there are no problems with strcpy whatsoever. gets, on the other hand, is a completely different story, as I said above.
yes, it is dangerous. After 5 years of maintenance, your code will look like this:
int main(void)
{
char *str1 = "abcdefghijklmnop";
{enough lines have been inserted here so as to not have str1 and str2 nice and close to each other on the screen}
char *str2 = malloc(100);
strcpy(str2, str1);
}
at that point, someone will go and change str1 to
str1 = "THIS IS A REALLY LONG STRING WHICH WILL NOW OVERRUN ANY BUFFER BEING USED TO COPY IT INTO UNLESS PRECAUTIONS ARE TAKEN TO RANGE CHECK THE LIMITS OF THE STRING. AND FEW PEOPLE REMEMBER TO DO THAT WHEN BUGFIXING A PROBLEM IN A 5 YEAR OLD BUGGY PROGRAM"
and forget to look where str1 is used and then random errors will start happening...
Your code is not safe. The return value of malloc is unchecked, if it fails and returns 0 the strcpy will give undefined behavior.
Besides that, I see no problem other than that the example basically does not do anything.
strcpy isn't dangerous as far as you know that the destination buffer is large enough to hold the characters of the source string; otherwise strcpy will happily copy more characters than your target buffer can hold, which can lead to several unfortunate consequences (stack/other variables overwriting, which can result in crashes, stack smashing attacks & co.).
But: if you have a generic char * in input which hasn't been already checked, the only way to be sure is to apply strlen to such string and check if it's too large for your buffer; however, now you have to walk the entire source string twice, once for checking its length, once to perform the copy.
This is suboptimal, since, if strcpy were a little bit more advanced, it could receive as a parameter the size of the buffer and stop copying if the source string were too long; in a perfect world, this is how strncpy would perform (following the pattern of other strn*** functions). However, this is not a perfect world, and strncpy is not designed to do this. Instead, the nonstandard (but popular) alternative is strlcpy, which, instead of going out of the bounds of the target buffer, truncates.
Several CRT implementations do not provide this function (notably glibc), but you can still get one of the BSD implementations and put it in your application. A standard (but slower) alternative can be to use snprintf with "%s" as format string.
That said, since you're programming in C++ (edit I see now that the C++ tag has been removed), why don't you just avoid all the C-string nonsense (when you can, obviously) and go with std::string? All these potential security problems vanish and string operations become much easier.
The only way malloc may fail is when an out-of-memory error occurs, which is a disaster by itself. You cannot reliably recover from it because virtually anything may trigger it again, and the OS is likely to kill your process anyway.
As you point out, under constrained circumstances strcpy isn't dangerous. It is more typical to take in a string parameter and copy it to a local buffer, which is when things can get dangerous and lead to a buffer overrun. Just remember to check your copy lengths before calling strcpy and null terminate the string afterward.
Aside for potentially dereferencing NULL (as you do not check the result from malloc) which is UB and likely not a security threat, there is no potential security problem with this.
gets() is always unsafe; the other functions can be used safely.
gets() is unsafe even when you have full control on the input -- someday, the program may be run by someone else.
The only safe way to use gets() is to use it for a single run thing: create the source; compile; run; delete the binary and the source; interpret results.
Many C code freeing pointers calls:
if (p)
free(p);
But why? I thought C standard say the free function doesn't do anything given a NULL pointer. So why another explicit check?
The construct:
free(NULL);
has always been OK in C, back to the original UNIX compiler written by Dennis Ritchie. Pre-standardisation, some poor compilers might not have fielded it correctly, but these days any compiler that does not cannot legitimately call itself a compiler for the C language. Using it typically leads to clearer, more maintainable code.
As I understand, the no-op on NULL was not always there.
In the bad old days of C (back around
1986, on a pre-ANSI standard cc
compiler) free(NULL) would dump core.
So most devs tested for NULL/0 before
calling free.
The world has come a long way, and it
appears that we don't need to do the
test anymore. But old habits die
hard;)
http://discuss.joelonsoftware.com/default.asp?design.4.194233.15
I tend to write "if (p) free(p)" a lot, even if I know it's not needed.
I partially blame myself because I learned C the old days when free(NULL) would segfault and I still feel uncomfortable not doing it.
But I also blame the C standard for not being consistent. Would, for example, fclose(NULL) be well defined, I would not have problems in writing:
free(p);
fclose(f);
Which is something that happens very often when cleaning up things.
Unfortunately, it seems strange to me to write
free(p);
if (f) fclose(f);
and I end up with
if (p) free(p);
if (f) fclose(f);
I know, it's not a rational reason but that's my case :)
Compilers, even when inlining are not smart enough to know the function will return immediately. Pushing parameters etc on stack and setting the call up up is obviously more expensive than testing a pointer. I think it is always good practice to avoid execution of anything, even when that anything is a no-op.
Testing for null is a good practice. An even better practice is to ensure your code does not reach this state and therefore eliminate the need for the test altogether.
There are two distinct reasons why a pointer variable could be NULL:
because the variable is used for what in type theory is called an option type, and holds either a pointer to an object, or NULL to represent nothing,
because it points to an array, and may therefore be NULL if the array has zero length (as malloc(0) is allowed to return NULL, implementation-defined).
Although this is only a logical distinction (in C there are neither option types nor special pointers to arrays and we just use pointers for everything), it should always be made clear how a variable is used.
That the C standard requires free(NULL) to do nothing is the necessary counterpart to the fact that a successful call to malloc(0) may return NULL. It is not meant as a general convenience, which is why for example fclose() does require a non-NULL argument. Abusing the permission to call free(NULL) by passing a NULL that does not represent a zero-length array feels hackish and wrong.
If you rely on that free(0) is OKAY, and it's normal for your pointer to be null at this point, please say so in comment // may be NULL
This may be merely self-explanatory code, saying yes I know, I also use p as a flag.
there can be a custom implementation of free() in mobile environment.
In that case free(0) can cause a problem.
(yeah, bad implementation)
if (p)
free(p);
why another explicit check?
If I write something like that, it's to convey the specific knowledge that the pointer may be NULL...to assist in readability and code comprehension. Because it looks a bit weird to make that an assert:
assert(p || !p);
free(p);
(Beyond looking strange, compilers are known to complain about "condition always true" if you turn your warnings up in many such cases.)
So I see it as good practice, if it's not clear from the context.
The converse case, of a pointer being expected to be non null, is usually evident from the previous lines of code:
...
Unhinge_Widgets(p->widgets);
free(p); // why `assert(p)`...you just dereferenced it!
...
But if it's non-obvious, having the assert may be worth the characters typed.