I'm learning about buffer overflows today and I came across many examples of programs which are vulnerable. The thing which makes me curious is, if there is any reason to work with program's arguments like this:
int main(int argc, char *argv[])
{
char argument_buffer[100];
strcpy(argument_buffer, argv[1]);
if(strcmp(argument_buffer, "testArg") == 0)
{
printf("Hello!\n");
}
// ...
}
Instead of simply:
int main(int argc, char *argv[])
{
if(strcmp(argv[1], "testArg") == 0)
{
printf("Hello!\n");
}
}
Please notice that I know about cons of strcpy etc. - it's just an example. My question is - is there any true reason for using temporary buffers to store arguments from argv? I assume there isn't any, but therefore I'm curious, why is it present in overflow examples, while in the reality it is never used? Maybe because of pure theory.
One possible real-world example: a program that renames *.foo to *.bar; you'll need both the original file name and a copy of it with the .foo part changed to .bar for the call to rename().
IIRC argv and its contents were not guaranteed to be writable and stable on all platforms, in the old times. C89 / C90 / ANSI-C standarized some of the existing practices. Similar for envp[]. Could also be that the routine of copying was inspired by the absence of memory protection on older platforms (such as MS-DOS). Normally (and nowadays) the OS and/or CRT takes care of copying the args form the caller's memory to the process's private memory arena.
Some programs prepend filenames with default paths:
void OpenLogFile (const char *fileName) {
char pathName[256];
sprintf(pathName, "/var/log/%s", fileName);
logFd = open(pathName, ...);
...
}
int main (int argc, char **argv) {
...
OpenLogFile(argv[i]);
...
}
If the entity that invokes the program passes in a name longer than 255-9 or so, sprintf overwrites past the end of pathName, and boom.
I am not answering this in terms of buffer overflow or security, but am answering strictly on why someone might want to make a copy of argv's contents.
If your program accepts a lot of arguments, like flags that would change execution path or processing mode, you might want to transfer argv's contents either directly to a log file, or store it temporarily in a buffer. If all decisions made on argv's contents occur in main and you still want to log argv's contents, you probably would not need to copy to a buffer.
If you depended on dispatched threads, processes, or even a subroutine making decisions based on the argv contents, you would probably want the argv values placed in a buffer, so you could pass them around.
Edit:
If you are worried about passing around a pointer, copy argv's contents to a fixed size buffer.
Related
In C on a small embedded system, is there any reason not to do this:
const char * filter_something(const char * original, const int max_length)
{
static char buffer[BUFFER_SIZE];
// checking inputs for safety omitted
// copy input to buffer here with appropriate filtering etc
return buffer;
}
this is essentially a utility function the source is FLASH memory which may be corrupted, we do a kind of "safe copy" to make sure we have a null terminated string. I chose to use a static buffer and make it available read only to the caller.
A colleague is telling me that I am somehow not respecting the scope of the buffer by doing this, to me it makes perfect sense for the use case we have.
I really do not see any reason not to do this. Can anyone give me one?
(LATER EDIT)
Many thanks to all who responded. You have generally confirmed my ideas on this, which I am grateful for. I was looking for major reasons not to do this, I don't think that there are any. To clarify a few points:
rentrancy/thread safety is not a concern. It is a small (bare metal) embedded system with a single run loop. This code will not be called from ISRs, ever.
in this system we are not short on memory, but we do want very predictable behavior. For this reason I prefer declaring an object like this statically, even though it might be a little "wasteful". We have already had issues with large objects declared carelessly on the stack, which caused intermittent crashes (now fixed but it took a while to diagnose). So in general, I am preferring static allocation, simply to have very predictability, reliability, and less potential issues downstream.
So basically it's a case of taking a certain approach for a specific system design.
Pro
The behavior is well defined; the static buffer exists for the duration of the program and may be used by the program after filter_something returns.
Cons
Returning a static buffer is prone to error because people writing calls to the routines may neglect or be unaware that a static buffer is returned. This can lead to attempts to use multiple instances of the buffer from multiple calls to the function (in the same thread or different threads). Clear documentation is essential.
The static buffer exists for the duration of the program, so it occupies space at times when it may not be needed.
It really depends on how filter_something is used. Take the following as an example
#include <stdio.h>
#include <string.h>
const char* filter(const char* original, const int max_length)
{
static char buffer[1024];
memset(buffer, 0, sizeof(buffer));
memcpy(buffer, original, max_length);
return buffer;
}
int main()
{
const char *strone, *strtwo;
char deepone[16], deeptwo[16];
/* Case 1 */
printf("%s\n", filter("everybody", 10));
/* Case 2 */
printf("%s %s %s\n", filter("nobody", 7), filter("somebody", 9), filter("anybody", 8));
/* Case 2 */
if (strcmp(filter("same",5), filter("different", 10)) == 0)
printf("Strings same\n");
else
printf("Strings different\n");
/* Case 3 - Both of these end up with the same pointer */
strone = filter("same",5);
strtwo = filter("different", 10);
if (strcmp(strone, strtwo) == 0)
printf("Strings same\n");
else
printf("Strings different\n");
/* Case 4 - You need a deep copy if you wish to compare */
strcpy(deepone, filter("same", 5));
strcpy(deeptwo, filter("different", 10));
if (strcmp(deepone, deeptwo) == 0)
printf("Strings same\n");
else
printf("Strings different\n");
}
The output when gcc is used is
everybody
nobody nobody nobody
Strings same
Strings same
Strings different.
When filter is used by itself, it behaves quite well.
When it is used multiple times in an expression, the behaviour is undefined there is no telling what it will do. All instances will use the contents the last time the filter was executed. This depends on the order in which the execution was performed.
If an instance is taken, the contents of the instance will not stay the same as when the instance was taken. This is also a common problem when C++ coders switch to C# or Java.
If a deep copy of the instance is taken, then the contents of the instance when the instance was taken will be preserved.
In C++, this technique is often used when returning objects with the same consequences.
It is true that the identifier buffer only has scope local to the block in which it is declared. However, because it is declared static, its lifetime is that of the full program.
So returning a pointer to a static variable is valid. In fact, many standard functions do this such as strtok and ctime.
The one thing you need to watch for is that such a function is not reentrant. For example, if you do something like this:
printf("filter 1: %s, filter 2: %s\n",
filter_something("abc", 3), filter_something("xyz", 3));
The two function calls can occur in any order, and both return the same pointer, so you'll get the same result printed twice (i.e. the result of whatever call happens to occur last) instead of two different results.
Also, if such a function is called from two different threads, you end up with a race condition with the threads reading/writing the same place.
Just to add to the previous answers, I think the problem, in a more abstract sense, is to make the filtering result broader in scope than it ought to be. You introduce a 'state' which seems useless, at least if the caller's intention is only to get a filtered string. In this case, it should be the caller who should create the array, likely on the stack, and pass it as a parameter to the filtering method. It is the introduction of this state that makes possible all the problems referred to in the preceding responses.
From a program design, it's frowned upon to return pointers to private data, in case that data was made private for a reason. That being said, it's less bad design to return a pointer to a local static then it is to use spaghetti programming with "globals" (external linkage). Particularly when the pointer returned is const qualified.
One general issue with staticvariables, that may or may not be a problem regardless of embedded or hosted system is re-entrancy. If the code needs to be interrupt/thread safe, then you need to implement means to achieve that.
The obvious alternative to it all is caller allocation and you've got to ask yourself why that's not an option:
void filter_something (size_t size, char dest[size], const char original[size]);
(Or if you will, [restrict size] on both pointers for a mini-optimization.)
I have the following code to get a string from a user with getopt() in C:
#include <stdio.h>
#include <getopt.h>
#include "util.h"
int main(int argc, char *argv[]) {
if (argc == 1) {
usage(); // from util.h
return 0;
}
char *argument = NULL;
char ch;
while ((ch = getopt(argc, argv, ":f:"))) {
switch (ch) {
case 'f':
argument = optarg;
break;
}
}
// ... use `argument` with other stuff ...
}
Is this safe to do, or should I use strcpy() to copy the string into argument? If I accidentally change the contents of argument, could an attacker change stuff like environment variables?
Is this safe to do, or should I use strcpy() to copy the string into argument?
It is safe to do, and getopt() is designed to be used that way. More generally, the program arguments provided to main are specified to be writable strings belonging to the program.
Do note, however, that some getopt() implementations, notably GNU's, may reorder the elements of argv.
If I accidentally change the contents of argument, could an attacker change stuff like environment variables?
The argument strings are susceptible to bounds overflow errors just like any other objects in the program. If, through programming error or other means, your program attempts to write past the bounds of any object then the behavior is undefined. Modification of environment variables is one of the more plausible members of the unbounded space of possible manifestations of UB.
Note, however, that making a copy of the arguments doesn't help in that regard. You must avoid overrunning the bounds of any such copies, too, lest UB be triggered, with the same unbounded space of possible manifestations.
If you want to make sure that the program arguments cannot be modified via variable argument, then it would be idiomatic to declare it as a pointer to const char instead of a pointer to (modifiable) char. That will not interfere with assigning the value of optarg to it.
Yes, this is "safe".
The contents of argv are supplied by the host environment, and belong to your process for its entire duration.
On argv:
The strings are modifiable, and any modifications made persist until program termination, although these modifications do not propagate back to the host environment: they can be used, for example, with strtok.
With that said, privileged processes, like the ps utility, are generally capable of accessing the current values of argv. Using argv for sensitive information is ill advised.
See: Hiding secret from command line parameter on Unix and Can argv be changed at runtime (not by the app itself) for security concerns.
Aside: getopt returns an int. The distinction is important, because if char is unsigned, it cannot represent the terminating value:
If all command-line options have been parsed, then getopt() returns -1.
You should use int to reliably test against this value, otherwise the loop will continue infinitely.
int ch;
while (-1 != (ch = getopt(argc, argv, ":f:")))
I have a program, which implements its functionality by exploiting another one. The only thing I need to do is appropriately modify its arguments. The problem is, when I execute another program, I never return to the original one. Therefore, using malloc to allocate space for modified strings will cause memory leak. What do? The program look like this:
int main(int argc, char *argv[])
{
argv[1] = ...
argv[2] = ...
...
exec("/bin/echo", argv);
}
argv[1], argv[2] are, for example, the same original strings but with some appended characters, so they need memory extension
So I have a program that works sometimes, but other times it doesn't. Sometimes putting in a printf statement will make the program magically work (and sometimes a printf will make it break). This makes me think I messed up as far as pointers/allocation goes, as I still have trouble wrapping my head around some things. The program is too long to post all the code, so I'll just show where I think the problem is.
I was given a function that takes a FILE* and a few char** as parameters, returning an int. The function is parsing a file, storing necessary strings to where the double pointers point to.
Here is how I used the function (alot of code is omitted/simplified):
char **s1;
char **s2;
char **s3;
int main(int argc, char ** argv){
s1 = (char**)malloc(20);
s2 = (char**)malloc(20);
s3 = (char**)malloc(20);
/* Unnecessary code omitted */
read(inFile);
}
int read(FILE* in){
/* omitted code */
parse(in,s1,s2,s3);
printf("%s",*s1); /* This is to show how I access the strings */
}
Im pretty sure that somewhere in my program, those strings are getting overwritten because I didn't allocate the memory properly. Hopefully my mistake is visible in the code snippet I gave, because I don't have many other theories for why my code doesn't work
Since the API to parse() is specified with char ** I believe it is safe to assume that you really do need the double-indirection in the call, but not in the declaration.
So probably what you need is to skip the malloc() calls and say:
char *s1, *s2, *s3;
...
parse(in, &s1, &s2, &s3);
This would allow parse() to allocate its own space and return the three pointers to its caller by modifying the pointers themselves. I appreciate your efforts to distill the question to its core but it might be interesting to see the prototype for parse().
My application has potentially a huge number of arguments passed in and I want to avoid the memory of hit duplicating the arguments into a filtered list. I would like to filter them in place but I am pretty sure that messing with argv array itself, or any of the data it points to, is probably not advisable. Any suggestions?
The C99 standard says this about modifying argv (and argc):
The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.
Once argv has been passed into the main method, you can treat it like any other C array - change it in place as you like, just be aware of what you're doing with it. The contents of the array don't have an effect on the return code or execution of the program other than what you explicitly do with it in code. I can't think of any reason it wouldn't "be advisable" to treat it specially.
Of course, you still need to take care about accidentally accessing memory beyond the bounds of argv. The flip side of it being accessible like a normal C array is that it's also prone to access errors just like any other normal C array. (Thanks to all who pointed this out in comments and other responses!)
The latest draft of the C standard (N1256) states that there are two allowed forms of the main function:
int main (void);
int main (int argc, char* argv[]);
but the crux is the clause "or in some other implementation-defined manner". This seems to me to be a loophole in the standard large enough to drive a semi-trailer through.
Some people specifically use "const char *" for the argv to disallow changes to the arguments. If your main function is defined that way, you are not permitted to change the characters that argv[] points to, as evidenced by the following program:
pax> cat qq.c
#include <stdio.h>
int main (int c, const char *v[]) {
*v[1] = 'X';
printf ("[%s]\n", v[1]);
return 0;
}
pax> gcc -o qq qq.c
qq.c: In function `main':
qq.c:3: error: assignment of read-only location
However, if you remove the "const", it works fine:
pax> cat qq2.c
#include <stdio.h>
int main (int c, char *v[]) {
*v[1] = 'X';
printf ("[%s]\n", v[1]);
return 0;
}
pax> gcc -o qq2 qq2.c ; ./qq2
[Xello]
I think this is also the case for C++. The current draft states:
All implementations shall allow both of the following definitions of main:
int main();
int main(int argc, char* argv[]);
but it doesn't specifically disallow other variants so you could presumably accept a "const" version in C++ as well (and, in fact, g++ does).
The only thing you need to be careful of is trying to increase the size of any of the elements. The standards do not mandate how they're stored so extending one argument may (probably will) affect others, or some other unrelated data.
Empirically, functions such as GNU getopt() permute the argument list without causing problems. As #Tim says, as long as you play sensibly, you can manipulate the array of pointers, and even individual strings. Just don't overrun any of the implicit array boundaries.
Some libraries do this!
The initialization method provided by the glut opengl library (GlutInit) scans for glut related arguments, and clears them by moving the subsequent elements in argv forward (moving the pointers, not the actual strings) and decrementing argc
2.1
glutInit glutInit is used to initialize the GLUT library.
Usage
void glutInit(int *argcp, char **argv);
argcp
A pointer to the
program's unmodified argc variable from main. Upon return, the value
pointed to by argcp will be updated, because glutInit extracts any
command line options intended for the GLUT library.
argv
The program's
unmodified argv variable from main. Like argcp, the data for argv will
be updated because glutInit extracts any command line options
understood by the GLUT library.
The operating system push the argv and argc into the applicaton stack before executing it, and you can treat them like any other stack variables.
The only time I would say that directly manipulating argv is a bad idea would be when an application changes its behavior depending on the contents of argv[0].
However, changing a program's behavior depending on argv[0] is in itself a very bad idea where portability is a concern.
Other than that, you can treat it just like you would any other array. As Jonathan said, GNU getopt() permutes the argument list non-destructively, I've seen other getopt() implementations that go as far as serializing and even hashing the arguments (useful when a program comes close to ARG_MAX).
Just be careful with your pointer arithmetic.
The original allocation of argv is left as a compiler/runtime choice.
So it may not be safe to modify it willy-nilly. Many systems build it on the stack, so it is auto-deallocated when main returns. Other build it on the heap, and free it (or not) when main returns.
It is safe to change the value of an argument, as long as you don't try to make it longer (buffer overrun error). It is safe to shuffle the order of the arguments.
To remove arguments you've preprocessed, something like this will work:
( lots of error conditions not checked for, "--special" other that first arg not checked for, etc. This is, after all, just a demo-of-concept. )
int main(int argc, char** argv)
{
bool doSpecial = false; // an assumption
if (0 == strcmp(argv[1], "--special"))
{
doSpecial = true; // no longer an assumption
// remove the "--special" argument
// but do copy the NULL at the end.
for(int i=1; i<argc; ++i)
argv[i] = argv[i+1];
--argc;
}
// all normal processing with "--special" removed.
// the doSpecial flag is available if wanted.
return 0;
}
But see this for full manipulation: (the part of the libiberty library that is used to manipulates argv style vectors)
http://www.opensource.apple.com/source/gcc/gcc-5666.3/libiberty/argv.c
It is licensed GNU LGPL.