Creating a user length defined array in C - c

I'm trying to make an array with variable starting length to get a string. The code should count the words and adjust the size of the array, but this is only a test and I expose it here because I want to know if it's a good practice or one error. And if there is something I should know about, or I must have in mind.
Note, I talk about C, not C++
#include <stdio.h>
int main()
{ int c,b,count;
scanf("%d",&c);
count=c+1;
getchar();
char a[count];
for ( c=b=0 ; c!=count && b!='\n' ; c++ )
{
b=getchar();
a[c]=b;
}
a[c]='\0';
printf("%s",a); printf("%d",c-1);
}
I don't need change the size of the array at the execution time.
I was testing and I don't remember well why I'm using the c variable at first time instead of count directly, but I remember the first getchar was to flush the buffer, because it didn't work without the getchar.
I don't know why I need to put getchar. If I delete the getchar the program fails.
Anyway the program works fine. The first time you run, it expects a number with scanf and then expects the text.
If the text is larger than the size of the array the program will ignore it.
The number is the size of the array.
My questions are:
It is a good practice do a[variable] to do this job?
Why I need the getchar?
It will be portable? I mean, I don't know if some systems or standards don't accept this like some old C compilers or somewhat.
There are better methods?

It is a good practice do a[variable] to do this job?
It depends on someone's compiler configuration. It has been supported since C99. However since there's not a good reason to use it in such a simple program, use the standard malloc instead. Here's an in-depth discussion of the topic.
Why I need the getchar?
There's likely some input still buffered up in your terminal, and that first character is discarding it. Try printing the value out to the screen to see what it is, that might help as figure it out.
It will be portable?
See my answer to your first question. It will probably work on modern versions of gcc, but for example it doesn't work in Windows C (which is still basically on C89).

It is a good practice do a[variable] to do this job?
Where the size is determined by arbitrary user input without imposed limits, it is not good practice. A user could easily enter a very large value and overrun the stack.
Use either dynamic allocation, or check and coerce the input value to some sensible limit.
Also worth noting that VLAs are not supported in C++ or some C compilers, so the code lacks portability.
Why I need the getchar?
The user has to enter at least a newline for scanf() to return, but the %d format specifier does not consume non-digit characters, so it remains buffered. However your code is easily broken by entering additional non-digit characters for example "16a<newline>" will assign 16 to c, and the a will be discarded leaving the newline buffered as before. A better solution is:
while( getchar() != `\n` ) {}
It will be portable? I mean, I don't know if some systems or standards don't accept this like some old C compilers or somewhat.
Adoption of C99 VLAs is variable, and in C11 they are optional in any case.
There are better methods?
I hesitate to say "better", but safer and more flexible and portable ways sure. With respect to the array allocation, you could use malloc().

Using malloc or calloc would be a better choice in C
https://www.tutorialspoint.com/c_standard_library/c_function_malloc.htm

Related

How to definitely solve the scanf input stream problem

Suppose I want to run the following C snippet:
scanf("%d" , &some_variable);
printf("something something\n\n");
printf("Press [enter] to continue...")
getchar(); //placed to give the user some time to read the something something
This snippet will not pause! The problem is that the scanf will leave the "enter" (\n)character in the input stream1, messing up all that comes after it; in this context the getchar() will eat the \n and not wait for an actual new character.
Since I was told not to use fflush(stdin) (I don't really get why tho) the best solution I have been able to come up with is simply to redefine the scan function at the start of my code:
void nsis(int *pointer){ //nsis arconim of: no shenanigans integer scanf
scanf("%d" , pointer);
getchar(); //this will clean the inputstream every time the scan function is called
}
And then we simply use nsis in place of scanf. This should fly. However it seems like a really homebrew, put-together-with-duct-tape, solution. How do professional C developers handle this mess? Do they not use scanf at all? Do they simply accept to work with a dirty input stream? What is the standard here?
I wasn't able to find a definite answer on this anywhere! Every source I could find mentioned a different (and sketchy) solution...
EDIT: In response to all commenting some version of "just don't use scanf": ok, I can do that, but what is the purpose of scanf then? Is it simply an useless broken function that should never be used? Why is it in the libraries to begin with then?
This seems really absurd, especially considering all beginners are taught to use scanf...
[1]: The \n left behind is the one that the user typed when inputting the value of the variable some_variable, and not the one present into the printf.
but what is the purpose of scanf then?
An excellent question.
Is it simply a useless broken function that should never be used?
It is almost useless. It is, arguably, quite broken. It should almost never be used.
Why is it in the libraries to begin with then?
My personal belief is that it was an experiment. It tries to be the opposite of printf. But that turned out not to be such a good idea in practice, and the function never got used very much, and pretty much fell out of favor, except for one particular use case...
This seems really absurd, especially considering all beginners are taught to use scanf...
You're absolutely right. It is really quite absurd.
There's a decent reason why all beginners are taught to use scanf, though. During week 1 of your first C programming class, you might write the little program
#include <stdio.h>
int main()
{
int size = 5;
for(int i = 0; i < size; i++) {
for(int j = 0; j < size; j++)
putchar('*');
putchar('\n');
}
}
to print a square. And during that first week, to make a square of a different size, you just edit the line int size = 5; and recompile.
But pretty soon — say, during week 2 — you want a way for the user to enter the size of the square, without having to recompile. You're probably not ready to muck around with argv. You're probably not ready to read a line of text using fgets and convert it back to an integer using atoi. (You're probably not even ready to seriously contemplate the vast differences between the integer 5 and the string "5" at all.) So — during week 2 of your first C programming class — scanf seems like just the ticket.
That's the "one particular use case" I was talking about. And if you only used scanf to read small integers into simple C programs during the second week of your first C programming class, things wouldn't be so bad. (You'd still have problems forgetting the &, but that would be more or less manageable.)
The problem (though this is again my personal belief) is that it doesn't stop there. Virtually every instructor of beginning C classes teaches students to use scanf. Unfortunately, few or none of those instructors ever explicitly tell students that scanf is a stopgap, to be used temporarily during that second week, and to be emphatically graduated beyond in later weeks. And, even worse, many instructors go on to assign more advanced problems, involving scanf, for which it is absolutely not a good solution, such as trying to do robust or "user friendly" input validation.
scanf's only virtue is that it seems like a nice, simple way to get small integers and other simple input from the user into your early programs. But the problem — actually a big, shuddering pile of 17 separate problems — is that scanf turns out to be vastly complicated and full of exceptions and hard to use, precisely the opposite of what you'd want in order to make things easy for beginners. scanf is only useful for beginners, and it's almost perfectly useless for beginners. It has been described as being like square training wheels on a child's bicycle.
How do professional C developers handle this mess?
Quite simply: by not using scanf at all. For one thing, very few production C programs print prompts to a line-based screen and ask users to type something followed by Return. And for those programs that do work that way, professional C developers unhesitatingly use fgets or the like to read a full line of input as text, then use other techniques to break down the line to extract the necessary information.
In answer to your initial question, there's no good answer. One of the fundamental rules of scanf usage (a set of rules, by the way, that no instructor ever teaches) is that you should never try to mix scanf and getchar (or fgets) in the same program. If there were a good way to make your "Press [enter] to continue..." code work after having called scanf, we wouldn't need that rule.
If you do want to try to flush the extra newline, so that a later call to getchar might work, there are several questions here with a bunch of good answers:
scanf() leaves the newline character in the buffer
Using fflush(stdin)
How to properly flush stdin in fgets loop
There's one more unrelated point that ends up being pretty significant to your question. When C was invented, there was no such thing as a GUI with multiple windows. Therefore no C programmer ever had the problem of having their output disappear before they could read it. Therefore no C programmer ever felt the need to write printf("Press [enter] to continue..."); followed by getchar(). I believe (another personal belief) that it is egregiously bad behavior for any vendor of a GUI-based C compiler to rig things up so that the output disappears upon program exit. Persistent output windows ought to be the default, for the benefit of beginning C programmers, with some kind of non-default option to turn that behavior off for those who don't want it.
Is scanf broken? No it is not. It is an excellent input function when you want to parse free form input data where few errors are to be expected. Free form means here that new lines are not relevant exactly as when you read/write very long paragraphs on a normal screen. And few errors expected is common when you read from files.
The scanf family function has another nice point: you have the same syntax when reading from the standard input stream, a file stream or a character string. It can easily parse simple common types and provide a minimal return value to allow cautious programmers to know whether all or part of all the expected data could be decoded.
That being said, it has major drawbacks: first being a C function, it cannot directly control whether the programmer has passed types meeting the format specifications, and second, as beginners are not consistenly hit on their head when they forget to control its return value, it is really too easy to make fully broken programs using it.
But the rule is:
if input is expected to be line oriented, first use fgets to get lines and then sscanf testing return values of both
only if input is expect to be free form (irrelevant newlines), scanf should be used directly. But never without testing its return value except for trivial tests.
Another drawback is that beginners hope it to be clever. It can indeed parse simple input formats, but is only a poor man's parser: do not use it as a generic parser because that is not what it is intended for.
Provided those rules are observed, it is a nice tool consistent with most of C language and its standard library: a simple tool to do simple things. It is up to programmers or library implementers to build richer tools.
I have only be using C language for more than 30 years, and was never bitten by scanf (well I was when I was a beginner, but I now know that I was to blame). Simply I have just tried for decades to only use it for what it can do...

c : gets() and fputs() are dangerous functions?

In the computer lab at school we wrote a program using fputs and the compiler returned an error gets is a dangerous function to use and a similar error for fputs
but at home when i type in this bit of code:
#include <stdio.h>
main()
{
FILE *fp;
char name[20];
fp = fopen("name.txt","w");
gets(name);
fputs(name,fp);
fclose(fp);
}
i get no errors what so ever. The one at school was similar to this one, just a bit lengthy and having more variables.
I use codeblocks at home and the default gcc provided with fedora at school.
Could it be a problem with the compiler?
With gets you need exactly know how many characters you will read and accordingly use a large enough buffer. If you use a buffer which is lesser than the contents of the file you read, you end up writing beyond the bounds of your allocated buffer and this results in an Undefined Behavior and an Invalid program.
Instead you should use fgets which allows you to specify how much data to read.
You don't get any errors because most likely your allocated buffer name is big enough to hold the contents of you file name.txt but if it was not then its a problem and hence the compiler issues the warning.
gets is certainly dangerous since there's no way to prevent buffer overflow.
For example, if your user entered 150 characters, that would almost certainly cause problems for your program. Use of scanf with an unbounded "%s" format specifier should also be avoided for input you have no control over.
However, the use of gets should not be an error since it complies with the standard. At most, it should be a warning (unless you, as the developer, configures something like "treat warnings as errors").
fputs is fine, not dangerous at all.
See here for a robust user input function, using fgets, which can be used to prevent buffer overflow.
It would just be the different settings of the different compilers. Maybe the compiler that Codeblocks uses isn't as verbose or has warnings turned off.
Regardless of the compiler they are dangerous functions to use as they have no checks for buffer overflow. Use fgets or fputs instead.
The other answers have all addressed gets, which is really and truly dangerous.
But the question also mentioned fputs. The fputs function is perfectly safe; it does not have these kinds of security concerns.
I believe the OP was probably mistaken in suggesting that the compiler had warned about fputs.
As for problems, there isn't any problem with any of the compilers. If you look at the link provided by Timothy Jones, you would understand why this warning is issued. As for different versions of compiler, compilers are configured differently to issue different levels of warning.

if one complains about gets(), why not do the same with scanf("%s",...)?

From man gets:
Never use gets(). Because it is
impossible to tell without knowing the
data in advance how many
characters gets() will read, and
because gets() will continue to store
characters past the end of the buffer,
it is extremely dangerous to use.
It has been used to break computer
security. Use fgets() instead.
Almost everywhere I see scanf being used in a way that should have the same problem (buffer overflow/buffer overrun): scanf("%s",string). This problem exists in this case? Why there are no references about it in the scanf man page? Why gcc does not warn when compiling this with -Wall?
ps: I know that there is a way to specify in the format string the maximum length of the string with scanf:
char str[10];
scanf("%9s",str);
edit: I am not asking to determe if the preceding code is right or not. My question is: if scanf("%s",string) is always wrong, why there are no warnings and there is nothing about it in the man page?
The answer is simply that no-one has written the code in GCC to produce that warning.
As you point out, a warning for the specific case of "%s" (with no field width) is quite appropriate.
However, bear in mind that this is only the case for the case of scanf(), vscanf(), fscanf() and vfscanf(). This format specifier can be perfectly safe with sscanf() and vsscanf(), so the warning should not be issued in that case. This means that you cannot simply add it to the existing "scanf-style-format-string" analysis code; you will have to split that into "fscanf-style-format-string" and "sscanf-style-format-string" options.
I'm sure if you produce a patch for the latest version of GCC it stands a good chance of being accepted (and of course, you will need to submit patches for the glibc header files too).
Using gets() is never safe. scanf() can be used safely, as you said in your question. However, determining if you're using it safely is a more difficult problem for the compiler to work out (e.g. if you're calling scanf() in a function where you pass in the buffer and a character count as arguments, it won't be able to tell); in that case, it has to assume that you know what you're doing.
When the compiler looks at the formatting string of scanf, it sees a string! That's assuming the formatting string is not entered at run-time. Some compilers like GCC have some extra functionality to analyze the formatting string if entered at compile time. That extra functionality is not comprehensive, because in some situations a run-time overhead is needed which is a NO NO for languages like C. For example, can you detect an unsafe usage without inserting some extra hidden code in this case:
char* str;
size_t size;
scanf("%z", &size);
str = malloc(size);
scanf("%9s"); // how can the compiler determine if this is a safe call?!
Of course, there are ways to write safe code with scanf if you specify the number of characters to read, and that there is enough memory to hold the string. In the case of gets, there is no way to specify the number of characters to read.
I am not sure why the man page for scanf doesn't mention the probability of a buffer overrun, but vanilla scanf is not a secure option. A rather dated link - Link shows this as the case. Also, check this (not gcc but informative nevertheless) - Link
It may be simply that scanf will allocate space on the heap based on how much data is read in. Since it doesn't allocate the buffer and then read until the null character is read, it doesn't risk overwriting the buffer. Instead, it reads into its own buffer until the null character is found, and presumably copies that buffer into another of the correct size at the end of the read.

Which functions from the standard library must (should) be avoided?

I've read on Stack Overflow that some C functions are "obsolete" or "should be avoided". Can you please give me some examples of this kind of function and the reason why?
What alternatives to those functions exist?
Can we use them safely - any good practices?
Deprecated Functions
Insecure
A perfect example of such a function is gets(), because there is no way to tell it how big the destination buffer is. Consequently, any program that reads input using gets() has a buffer overflow vulnerability. For similar reasons, one should use strncpy() in place of strcpy() and strncat() in place of strcat().
Yet some more examples include the tmpfile() and mktemp() function due to potential security issues with overwriting temporary files and which are superseded by the more secure mkstemp() function.
Non-Reentrant
Other examples include gethostbyaddr() and gethostbyname() which are non-reentrant (and, therefore, not guaranteed to be threadsafe) and have been superseded by the reentrant getaddrinfo() and freeaddrinfo().
You may be noticing a pattern here... either lack of security (possibly by failing to include enough information in the signature to possibly implement it securely) or non-reentrance are common sources of deprecation.
Outdated, Non-Portable
Some other functions simply become deprecated because they duplicate functionality and are not as portable as other variants. For example, bzero() is deprecated in favor of memset().
Thread Safety and Reentrance
You asked, in your post, about thread safety and reentrance. There is a slight difference. A function is reentrant if it does not use any shared, mutable state. So, for example, if all the information it needs is passed into the function, and any buffers needed are also passed into the function (rather than shared by all calls to the function), then it is reentrant. That means that different threads, by using independent parameters, do not risk accidentally sharing state. Reentrancy is a stronger guarantee than thread safety. A function is thread safe if it can be used by multiple threads concurrently. A function is thread safe if:
It is reentrant (i.e. it does not share any state between calls), or:
It is non-reentrant, but it uses synchronization/locking as needed for shared state.
In general, in the Single UNIX Specification and IEEE 1003.1 (i.e. "POSIX"), any function which is not guaranteed to be reentrant is not guaranteed to be thread safe. So, in other words, only functions which are guaranteed to be reentrant may be portably used in multithreaded applications (without external locking). That does not mean, however, that implementations of these standards cannot choose to make a non-reentrant function threadsafe. For example, Linux frequently adds synchronization to non-reentrant functions in order to add a guarantee (beyond that of the Single UNIX Specification) of threadsafety.
Strings (and Memory Buffers, in General)
You also asked if there is some fundamental flaw with strings/arrays. Some might argue that this is the case, but I would argue that no, there is no fundamental flaw in the language. C and C++ require you to pass the length/capacity of an array separately (it is not a ".length" property as in some other languages). This is not a flaw, per-se. Any C and C++ developer can write correct code simply by passing the length as a parameter where needed. The problem is that several APIs that required this information failed to specify it as a parameter. Or assumed that some MAX_BUFFER_SIZE constant would be used. Such APIs have now been deprecated and replaced by alternative APIs that allow the array/buffer/string sizes to be specified.
Scanf (In Answer to Your Last Question)
Personally, I use the C++ iostreams library (std::cin, std::cout, the << and >> operators, std::getline, std::istringstream, std::ostringstream, etc.), so I do not typically deal with that. If I were forced to use pure C, though, I would personally just use fgetc() or getchar() in combination with strtol(), strtoul(), etc. and parse things manually, since I'm not a huge fan of varargs or format strings. That said, to the best of my knowledge, there is no problem with [f]scanf(), [f]printf(), etc. so long as you craft the format strings yourself, you never pass arbitrary format strings or allow user input to be used as format strings, and you use the formatting macros defined in <inttypes.h> where appropriate. (Note, snprintf() should be used in place of sprintf(), but that has to do with failing to specify the size of the destination buffer and not the use of format strings). I should also point out that, in C++, boost::format provides printf-like formatting without varargs.
Once again people are repeating, mantra-like, the ludicrous assertion that the "n" version of str functions are safe versions.
If that was what they were intended for then they would always null terminate the strings.
The "n" versions of the functions were written for use with fixed length fields (such as directory entries in early file systems) where the nul terminator is only required if the string does not fill the field. This is also the reason why the functions have strange side effects that are pointlessly inefficient if just used as replacements - take strncpy() for example:
If the array pointed to by s2 is a
string that is shorter than n bytes,
null bytes are appended to the copy in
the array pointed to by s1, until n
bytes in all are written.
As buffers allocated to handle filenames are typically 4kbytes this can lead to a massive deterioration in performance.
If you want "supposedly" safe versions then obtain - or write your own - strl routines (strlcpy, strlcat etc) which always nul terminate the strings and don't have side effects. Please note though that these aren't really safe as they can silently truncate the string - this is rarely the best course of action in any real-world program. There are occasions where this is OK but there are also many circumstances where it could lead to catastrophic results (e.g. printing out medical prescriptions).
Several answers here suggest using strncat() over strcat(); I'd suggest that strncat() (and strncpy()) should also be avoided. It has problems that make it difficult to use correctly and lead to bugs:
the length parameter to strncat() is related to (but not quite exactly - see the 3rd point) the maximum number of characters that can be copied to the destination rather than the size of the destination buffer. This makes strncat() more difficult to use than it should be, particularly if multiple items will be concatenated to the destination.
it can be difficult to determine if the result was truncated (which may or may not be important)
it's easy to have an off-by-one error. As the C99 standard notes, "Thus, the maximum number of characters that can end up in the array pointed to by s1 is strlen(s1)+n+1" for a call that looks like strncat( s1, s2, n)
strncpy() also has an issue that can result in bugs you try to use it in an intuitive way - it doesn't guarantee that the destination is null terminated. To ensure that you have to make sure you specifically handle that corner case by dropping a '\0' in the buffer's last location yourself (at least in certain situations).
I'd suggest using something like OpenBSD's strlcat() and strlcpy() (though I know that some people dislike those functions; I believe they're far easier to use safely than strncat()/strncpy()).
Here's a little of what Todd Miller and Theo de Raadt had to say about problems with strncat() and strncpy():
There are several problems encountered when strncpy() and strncat() are used as safe versions of strcpy() and strcat(). Both functions deal with NUL-termination and the length parameter in different and non-intuitive ways that confuse even experienced programmers. They also provide no easy way to detect when truncation occurs. ... Of all these issues, the confusion caused by the length parameters and the related issue of NUL-termination are most important. When we audited the OpenBSD source tree for potential security holes we found rampant misuse of strncpy() and strncat(). While not all of these resulted in exploitable security holes, they made it clear that the rules for using strncpy() and strncat() in safe string operations are widely misunderstood.
OpenBSD's security audit found that bugs with these functions were "rampant". Unlike gets(), these functions can be used safely, but in practice there are a lot of problems because the interface is confusing, unintuitive and difficult to use correctly. I know that Microsoft has also done analysis (though I don't know how much of their data they may have published), and as a result have banned (or at least very strongly discouraged - the 'ban' might not be absolute) the use of strncat() and strncpy() (among other functions).
Some links with more information:
http://www.usenix.org/events/usenix99/full_papers/millert/millert_html/
http://en.wikipedia.org/wiki/Off-by-one_error#Security_implications
http://blogs.msdn.com/michael_howard/archive/2004/10/29/249713.aspx
http://blogs.msdn.com/michael_howard/archive/2004/11/02/251296.aspx
http://blogs.msdn.com/michael_howard/archive/2004/12/10/279639.aspx
http://blogs.msdn.com/michael_howard/archive/2006/10/30/something-else-to-look-out-for-when-reviewing-code.aspx
Standard library functions that should never be used:
setjmp.h
setjmp(). Together with longjmp(), these functions are widely recogniced as incredibly dangerous to use: they lead to spaghetti programming, they come with numerous forms of undefined behavior, they can cause unintended side-effects in the program environment, such as affecting values stored on the stack. References: MISRA-C:2012 rule 21.4, CERT C MSC22-C.
longjmp(). See setjmp().
stdio.h
gets(). The function has been removed from the C language (as per C11), as it was unsafe as per design. The function was already flagged as obsolete in C99. Use fgets() instead. References: ISO 9899:2011 K.3.5.4.1, also see note 404.
stdlib.h
atoi() family of functions. These have no error handling but invoke undefined behavior whenever errors occur. Completely superfluous functions that can be replaced with the strtol() family of functions. References: MISRA-C:2012 rule 21.7.
string.h
strncat(). Has an awkward interface that are often misused. It is mostly a superfluous function. Also see remarks for strncpy().
strncpy(). The intention of this function was never to be a safer version of strcpy(). Its sole purpose was always to handle an ancient string format on Unix systems, and that it got included in the standard library is a known mistake. This function is dangerous because it may leave the string without null termination and programmers are known to often use it incorrectly. References: Why are strlcpy and strlcat considered insecure?, with a more detailed explanation here: Is strcpy dangerous and what should be used instead?.
Standard library functions that should be used with caution:
assert.h
assert(). Comes with overhead and should generally not be used in production code. It is better to use an application-specific error handler which displays errors but does not necessarily close down the whole program.
signal.h
signal(). References: MISRA-C:2012 rule 21.5, CERT C SIG32-C.
stdarg.h
va_arg() family of functions. The presence of variable-length functions in a C program is almost always an indication of poor program design. Should be avoided unless you have very specific requirements.
stdio.h
Generally, this whole library is not recommended for production code, as it comes with numerous cases of poorly-defined behavior and poor type safety.
fflush(). Perfectly fine to use for output streams. Invokes undefined behavior if used for input streams.
gets_s(). Safe version of gets() included in C11 bounds-checking interface. It is preferred to use fgets() instead, as per C standard recommendation. References: ISO 9899:2011 K.3.5.4.1.
printf() family of functions. Resource heavy functions that come with lots of undefined behavior and poor type safety. sprintf() also has vulnerabilities. These functions should be avoided in production code. References: MISRA-C:2012 rule 21.6.
scanf() family of functions. See remarks about printf(). Also, - scanf() is vulnerable to buffer overruns if not used correctly. fgets() is preferred to use when possible. References: CERT C INT05-C, MISRA-C:2012 rule 21.6.
tmpfile() family of functions. Comes with various vulnerability issues. References: CERT C FIO21-C.
stdlib.h
malloc() family of functions. Perfectly fine to use in hosted systems, though be aware of well-known issues in C90 and therefore don't cast the result. The malloc() family of functions should never be used in freestanding applications. References: MISRA-C:2012 rule 21.3.
Also note that realloc() is dangerous in case you overwrite the old pointer with the result of realloc(). In case the function fails, you create a leak.
system(). Comes with lots of overhead and although portable, it is often better to use system-specific API functions instead. Comes with various poorly-defined behavior. References: CERT C ENV33-C.
string.h
strcat(). See remarks for strcpy().
strcpy(). Perfectly fine to use, unless the size of the data to be copied is unknown or larger than the destination buffer. If no check of the incoming data size is done, there may be buffer overruns. Which is no fault of strcpy() itself, but of the calling application - that strcpy() is unsafe is mostly a myth created by Microsoft.
strtok(). Alters the caller string and uses internal state variables, which could make it unsafe in a multi-threaded environment.
Some people would claim that strcpy and strcat should be avoided, in favor of strncpy and strncat. This is somewhat subjective, in my opinion.
They should definitely be avoided when dealing with user input - no doubt here.
In code "far" from the user, when you just know the buffers are long enough, strcpy and strcat may be a bit more efficient because computing the n to pass to their cousins may be superfluous.
Avoid
strtok for multithreaded programs as its not thread-safe.
gets as it could cause buffer overflow
It is probably worth adding again that strncpy() is not the general-purpose replacement for strcpy() that it's name might suggest. It is designed for fixed-length fields that don't need a nul-terminator (it was originally designed for use with UNIX directory entries, but can be useful for things like encryption key fields).
It is easy, however, to use strncat() as a replacement for strcpy():
if (dest_size > 0)
{
dest[0] = '\0';
strncat(dest, source, dest_size - 1);
}
(The if test can obviously be dropped in the common case, where you know that dest_size is definitely nonzero).
Also check out Microsoft's list of banned APIs. These are APIs (including many already listed here) that are banned from Microsoft code because they are often misused and lead to security problems.
You may not agree with all of them, but they are all worth considering. They add an API to the list when its misuse has led to a number of security bugs.
It is very hard to use scanf safely. Good use of scanf can avoid buffer overflows, but you are still vulnerable to undefined behavior when reading numbers that don't fit in the requested type. In most cases, fgets followed by self-parsing (using sscanf, strchr, etc.) is a better option.
But I wouldn't say "avoid scanf all the time". scanf has its uses. As an example, let's say you want to read user input in a char array that's 10 bytes long. You want to remove the trailing newline, if any. If the user enters more than 9 characters before a newline, you want to store the first 9 characters in the buffer and discard everything until the next newline. You can do:
char buf[10];
scanf("%9[^\n]%*[^\n]", buf));
getchar();
Once you get used to this idiom, it's shorter and in some ways cleaner than:
char buf[10];
if (fgets(buf, sizeof buf, stdin) != NULL) {
char *nl;
if ((nl = strrchr(buf, '\n')) == NULL) {
int c;
while ((c = getchar()) != EOF && c != '\n') {
;
}
} else {
*nl = 0;
}
}
Almost any function that deals with NUL terminated strings is potentially unsafe.
If you are receiving data from the outside world and manipulating it via the str*() functions then you set yourself up for catastrophe
Don't forget about sprintf - it is the cause of many problems. This is true because the alternative, snprintf has sometimes different implementations which can make you code unportable.
linux: http://linux.die.net/man/3/snprintf
windows: http://msdn.microsoft.com/en-us/library/2ts7cx93%28VS.71%29.aspx
In case 1 (linux) the return value is the amount of data needed to store the entire buffer (if it is smaller than the size of the given buffer then the output was truncated)
In case 2 (windows) the return value is a negative number in case the output is truncated.
Generally you should avoid functions that are not:
buffer overflow safe (a lot of functions are already mentioned in here)
thread safe/not reentrant (strtok for example)
In the manual of each functions you should search for keywords like: safe, sync, async, thread, buffer, bugs
In all the string-copy/move scenarios - strcat(), strncat(), strcpy(), strncpy(), etc. - things go much better (safer) if a couple simple heuristics are enforced:
1. Always NUL-fill your buffer(s) before adding data.
2. Declare character-buffers as [SIZE+1], with a macro-constant.
For example, given:
#define BUFSIZE 10
char Buffer[BUFSIZE+1] = { 0x00 }; /* The compiler NUL-fills the rest */
we can use code like:
memset(Buffer,0x00,sizeof(Buffer));
strncpy(Buffer,BUFSIZE,"12345678901234567890");
relatively safely. The memset() should appear before the strncpy(), even though we initialized Buffer at compile-time, because we don't know what garbage other code placed into it before our function was called. The strncpy() will truncate the copied data to "1234567890", and will not NUL-terminate it. However, since we have already NUL-filled the entire buffer - sizeof(Buffer), rather than BUFSIZE - there is guaranteed to be a final "out-of-scope" terminating NUL anyway, as long as we constrain our writes using the BUFSIZE constant, instead of sizeof(Buffer).
Buffer and BUFSIZE likewise work fine for snprintf():
memset(Buffer,0x00,sizeof(Buffer));
if(snprintf(Buffer,BUFIZE,"Data: %s","Too much data") > BUFSIZE) {
/* Do some error-handling */
} /* If using MFC, you need if(... < 0), instead */
Even though snprintf() specifically writes only BUFIZE-1 characters, followed by NUL, this works safely. So we "waste" an extraneous NUL byte at the end of Buffer...we prevent both buffer-overflow and unterminated string conditions, for a pretty small memory-cost.
My call on strcat() and strncat() is more hard-line: don't use them. It is difficult to use strcat() safely, and the API for strncat() is so counter-intuitive that the effort needed to use it properly negates any benefit. I propose the following drop-in:
#define strncat(target,source,bufsize) snprintf(target,source,"%s%s",target,source)
It is tempting to create a strcat() drop-in, but not a good idea:
#define strcat(target,source) snprintf(target,sizeof(target),"%s%s",target,source)
because target may be a pointer (thus sizeof() does not return the information we need). I don't have a good "universal" solution to instances of strcat() in your code.
A problem I frequently encounter from "strFunc()-aware" programmers is an attempt to protect against buffer-overflows by using strlen(). This is fine if the contents are guaranteed to be NUL-terminated. Otherwise, strlen() itself can cause a buffer-overrun error (usually leading to a segmentation violation or other core-dump situation), before you ever reach the "problematic" code you are trying to protect.
atoi is not thread safe. I use strtol instead, per recommendation from the man page.

When/why is it a bad idea to use the fscanf() function?

In an answer there was an interesting statement: "It's almost always a bad idea to use the fscanf() function as it can leave your file pointer in an unknown location on failure. I prefer to use fgets() to get each line in and then sscanf() that."
Could you expand upon when/why it might be better to use fgets() and sscanf() to read some file?
Imagine a file with three lines:
1
2b
c
Using fscanf() to read integers, the first line would read fine but on the second line fscanf() would leave you at the 'b', not sure what to do from there. You would need some mechanism to move past the garbage input to see the third line.
If you do a fgets() and sscanf(), you can guarantee that your file pointer moves a line at a time, which is a little easier to deal with. In general, you should still be looking at the whole string to report any odd characters in it.
I prefer the latter approach myself, although I wouldn't agree with the statement that "it's almost always a bad idea to use fscanf()"... fscanf() is perfectly fine for most things.
The case where this comes into play is when you match character literals. Suppose you have:
int n = fscanf(fp, "%d,%d", &i1, &i2);
Consider two possible inputs "323,A424" and "323A424".
In both cases fscanf() will return 1 and the next character read will be an 'A'. There is no way to determine if the comma was matched or not.
That being said, this only matters if finding the actual source of the error is important. In cases where knowing there is malformed input error is enough, fscanf() is actually superior to writing custom parsing code.
When fscanf() fails, due to an input failure or a matching failure, the file pointer (that is, the position in the file from which the next byte will be read) is left in a position other than where it would be had the fscanf() succeeded. This is typically undesirable in sequential file reads. Reading one line at a time results in the file input being predictable, while single line failures can be handled individually.
There are two reasons:
scanf() can leave stdin in a state that's difficult to predict; this makes error recovery difficult if not impossible (this is less of a problem with fscanf()); and
The entire scanf() family take pointers as arguments, but no length limit, so they can overrun a buffer and alter unrelated variables that happen to be after the buffer, causing seemingly random memory corruption errors that very difficult to understand, find, and debug, particularly for less experienced C programmers.
Novice C programmers are often confused about pointers and the “address-of” operator, and frequently omit the & where it's needed, or add it “for good measure” where it's not. This causes “random” segfaults that can be hard for them to find. This isn't scanf()'s fault, so I leave it off my list, but it is worth bearing in mind.
After 23 years, I still remember it being a huge pain when I started C programming and didn't know how to recognize and debug these kinds of errors, and (as someone who spent years teaching C to beginners) it's very hard to explain them to a novice who doesn't yet understand pointers and stack.
Anyone who recommends scanf() to a novice C programmer should be flogged mercilessly.
OK, maybe not mercilessly, but some kind of flogging is definitely in order ;o)
It's almost always a bad idea to use the fscanf() function as it can leave your file pointer in an unknown location on failure. I prefer to use fgets() to get each line in and then sscanf() that.
You can always use ftell() to find out current position in file, and then decide what to do from there. Basicaly, if you know what you can expect then feel free to use fscanf().
Basically, there's no way to to tell that function not to go out of bounds for the memory area you've allocated for it.
A number of replacements have come out, like fnscanf, which is an attempt to fix those functions by specifying a maximum limit for the reader to write, thus allowing it to not overflow.

Resources