void print( char * str ){ printf("%s",str); }
void some_function(){
... Loads a file into `char * buffer` ...
... Stores the function print before `buffer` address ...
((void(*)(void))buffer();
}
In the file, there is a "Hello World" in it, and some unreadable characters. Executing the buffer will print "Hello World".
I know you can execute a pointer like this:
void (*foo)(int) = &bar; // such that void bar(int)
(*foo)(123);
But executing void(int) function as a void(void) function with its function and parameters in memory is new to me.
Is there a standard of how a function look like in memory (like a string is terminated by a null character) such that you can execute it in this way?
But executing void(int) function as a void(void) function with its
function and parameters in memory is new to me.
Good. It's undefined behavior and you should never do it. It might happen to work on some implementation, but there's no guaranteed result.
Is there a standard of how a function look like in memory (like a
string is terminated by a null character) such that you can execute it
in this way?
No, there isn't. More relevantly, there is no standard for how arguments are passed to functions -- it's up to the implementation, and there is no requirement that it be implemented in such a way that functions with different types are compatible.
But in this case, the programmer is taking advantage of knowledge of what a function looks like in memory, and has prepared the file in such a way that it does the right thing when loaded into memory and run in this particular implementation. This is, after all, what loaders do ... they read programs from files into memory and then execute them. This method is also used by exploits that take advantage of bugs in programs to load nefarious code into the program's memory and get the program to execute it.
Related
I'm trying to fix a bug in very old C code that just popped up recently on Windows after we updated Visual Studio to version 2017. The code still runs on several linux platforms. That tells me we're probably relying on some undefined behavior, and we previously got lucky.
We have a bunch of function calls like this:
get_some_data(parent, "ge", "", type);
Running in debug, I noticed that upon entry to this function, that empty string is immediately filled with garbage, before the function has done anything. The function is declared like this:
static void get_some_data(
KEY Parent,
char *Prefix,
char *Suffix,
ENT EntType)
So is it unwise to pass the strings directly ("ge", "")? I know it's trivial to fix this case by declaring char *suffix="" and passing suffix instead of "", but I'm now questioning whether I need to go through this entire suite of code looking for this type of function call.
So is it unwise to pass the strings directly ("ge", "")?
There is nothing inherently wrong with passing string literals to functions in general.
However, it is unwise to pass pointers (in)to string literals specifically to parameters declared as pointers to non-const char, because C specifies that undefined behavior results from attempting to modify a string literal, and in practice, that UB often manifests as abrupt program termination. If a function declares a parameter as const char * then you can reasonably take that as a promise that it will not attempt to modify the target of that pointer -- which is what you need to ensure -- but if it declares a parameter as just char * then no such promise is made, and the function doesn't even have a way to check at runtime whether the argument is writable.
Possibly you can rely on documentation in place of const-qualification, for you're ok in this regard as long as no attempt is made in practice to modify a string literal, but that still leaves you more open to bugs than you otherwise would be.
I know it's trivial to fix this case by declaring char *suffix="" and passing suffix instead of ""
Such a change may disguise what you're doing from the compiler, so that it does not warn about the function call, but it does not fix anything. The same pointer value is passed to the function either way, and the same semantics and constraints apply. Also, if the compiler warned about the function call then it should also warn about the assignment.
This is not an issue in C++, by the way, or at least not the same issue, because in C++, string literals represent arrays of const char in the first place.
, but I'm now questioning whether I need to go through this entire suite of code looking for this type of function call.
Better might be to modify the signatures of the called functions. Where you intend for it to be ok to pass a string literal, ensure that the parameter has type const char *, like so:
static void get_some_data(
KEY Parent,
const char *Prefix,
const char *Suffix,
ENT EntType)
But do note that is highly likely to cause new warnings about violations of const-correctness. To ensure safety, you need to fix these, too, without casting away constness. This could well cascade broadly, but the exercise will definitely help you identify and fix places where your code was mishandling string literals.
On the other hand, a genuine fix that might be less pervasive would be to pass pointers to modifiable arrays instead of (unmodifiable) string literals. Perhaps that's what you had in mind with your proposed fix, but the correct way to do that is this:
char prefix[] = "ge";
char suffix[] = "";
get_some_data(parent, prefix, suffix, type);
Here, prefix and suffix are separate (modifiable) local arrays, initialized with copies of the string literals' contents.
With all that said, I'm inclined to suspect that if you're getting bona fide runtime errors related to these arguments with VS-compiled executables but not GCC-compiled ones, then the source of those is probably something else. My first guess would be that array bounds are being overrun. My second guess would be that you are compiling C code as C++, and running afoul of one or more of the (other) differences between them.
That's not to say that you shouldn't take a good look at the constness / writability concerns involved here, but it would suck to go through the whole exercise just to find out that you were ok to begin with. You could still end up with better code, but that's a little tricky to sell to the boss when they ask why the bug hasn't been fixed yet.
No, there is absolutely nothing wrong with passing a string literal, empty or not. Quite the opposite — if you try to "fix" your code by doing your trivial change, you will hide the bug and make life harder for whoever is going to fix it in future.
I encountered this same problem and the cause was a similar situation in a recently executed function where, crucially, the contents of the string were being changed.
Prior to the call where this strange behaviour was being observed, there was a call to older code with a parameter of PSTR type. Not realizing that the contents of that parameter were going to be changed, a programmer had supplied an empty string. The code was only ever updating the first character so declaring a char type parameter and supplying the address of that was sufficient to solve the problem that was exhibiting in the later call.
I recently had a question, I know that a pointer to a constant array initialized as it is in the code below, is in the .rodata region and that this region is only readable.
However, I saw in pattern C11, that writing in this memory address behavior will be undefined.
I was aware that the Borland's Turbo-C compiler can write where the pointer points, this would be because the processor operated in real mode on some systems of the time, such as MS-DOS? Or is it independent of the operating mode of the processor? Is there any other compiler that writes to the pointer and does not take any memory breach failure using the processor in protected mode?
#include <stdio.h>
int main(void) {
char *st = "aaa";
*st = 'b';
return 0;
}
In this code compiling with Turbo-C in MS-DOS, you will be able to write to memory
As has been pointed out, trying to modify a constant string in C results in undefined behavior. There are several reasons for this.
One reason is that the string may be placed in read-only memory. This allows it to be shared across multiple instances of the same program, and doesn't require the memory to be saved to disk if the page it's on is paged out (since the page is read-only and thus can be reloaded later from the executable). It also helps detect run-time errors by giving an error (e.g. a segmentation fault) if an attempt is made to modify it.
Another reason is that the string may be shared. Many compilers (e.g., gcc) will notice when the same literal string appears more than once in a compilation unit, and will share the same storage for it. So if a program modifies one instance, it could affect others as well.
There is also never a need to do this, since the same intended effect can easily be achieved by using a static character array. For instance:
#include <stdio.h>
int main(void) {
static char st_arr[] = "aaa";
char *st = st_arr;
*st = 'b';
return 0;
}
This does exactly what the posted code attempted to do, but without any undefined behavior. It also takes the same amount of memory. In this example, the string "aaa" is used as an array initializer, and does not have any storage of its own. The array st_arr takes the place of the constant string from the original example, but (1) it will not be placed in read-only memory, and (2) it will not be shared with any other references to the string. So it's safe to modify it, if in fact that's what you want.
Is there any other compiler that writes to the pointer and does not take any memory breach failure using the processor in protected mode?
GCC 3 and earlier used to support gcc -fwriteable-strings to let you compile old K&R C where this was apparently legal, according to https://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Incompatibilities.html. (It's undefined behaviour in ISO C and thus a bug in an ISO C program). That option will define the behaviour of the assignment which ISO C leaves undefined.
GCC 3.3.6 manual - C Dialect options
-fwritable-strings
Store string constants in the writable data segment and don't uniquize them. This is for compatibility with old programs which assume they can write into string constants.
Writing into string constants is a very bad idea; “constants” should be constant.
GCC 4.0 removed that option (release notes); the last GCC3 series was gcc3.4.6 in March 2006. Although apparently it had become buggy in that version.
gcc -fwritable-strings would treat string literals like non-const anonymous character arrays (see #gnasher's answer), so they go in the .data section instead of .rodata, and thus get linked into a segment of the executable that's mapped to read+write pages, not read-only. (Executable segments have basically nothing to do with x86 segmentation, it's just a start+range memory-mapping from the executable file to memory.)
And it would disable duplicate-string merging, so char *foo() { return "hello"; } and char *bar() { return "hello"; } would return different pointer values, instead of merging identical string literals.
Related:
How can some GCC compilers modify a constant char pointer?
https://softwareengineering.stackexchange.com/questions/294748/why-are-c-string-literals-read-only
Linker option: still Undefined Behaviour so probably not viable
On GNU/Linux, linking with ld -N (--omagic) will make the text (as well as data) section read+write. This may apply to .rodata even though modern GNU Binutils ld puts .rodata in its own section (normally with read but not exec permission) instead of making it part of .text. Having .text writeable could easily be a security problem: you never want a page with write+exec at the same time, otherwise some bugs like buffer overflows can turn into code-injection attacks.
To do this from gcc, use gcc -Wl,-N to pass on that option to ld when linking.
This doesn't do anything about it being Undefined Behaviour to write const objects. e.g. the compiler will still merge duplicate strings, so writing into one char *foo = "hello"; will affect all other uses of "hello" in the whole program, even across files.
What to use instead:
If you want something writeable, use static char foo[] = "hello"; where the quoted string is just an array initializer for a non-const array. As a bonus, this is more efficient than static char *foo = "hello"; at global scope, because there's one fewer level of indirection to get to the data: it's just an array instead a pointer stored in memory.
You are asking whether or not the platform may cause undefined behavior to be defined. The answer to that question is yes.
But you are also asking whether or not the platform defines this behavior. In fact it does not.
Under some optimization hints, the compiler will merge string constants, so that writing to one constant will write to the other uses of that constant. I used this compiler once, it was quite capable of merging strings.
Don't write this code. It's not good. You will regret writing code in this style when you move onto a more modern platform.
Your literal "aaa" produces a static array of four const char 'a', 'a', 'a', '\0' in an anonymous location and returns a pointer to the first 'a', cast to char*.
Trying to modify any of the four characters is undefined behaviour. Undefined behaviour can do anything, from modifying the char as intended, pretending to modify the char, doing nothing, or crashing.
It's basically the same as static const char anonymous[4] = { 'a', 'a', 'a', '\0' }; char* st = (char*) &anonymous [0];
To add to the correct answers above, DOS runs in real mode, so there is no read only memory. All memory is flat and writable. Hence, writing to the literal was well defined (as it was in any sort of const variable) at the time.
If the following array contained shell code in a C program on a LINUX machine
char buf [100]
then how does the following execute this shell code :
((void(*)())buf)()
Simple. It casts buf to a pointer-to-function taking no arguments and returning void, and then invokes that function.
However, that probably won't work since the page containing buf is highly unlikely to be marked as executable.
I have the following structure:
struct sys_config_s
{
char server_addr[256];
char listen_port[100];
char server_port[100];
char logfile[PATH_MAX];
char pidfile[PATH_MAX];
char libfile[PATH_MAX];
int debug_flag;
unsigned long connect_delay;
};
typedef struct sys_config_s sys_config_t;
I also have a function defined in a static library (let's call it A.lib):
sys_config_t* sys_get_config(void)
{
static sys_config_t config;
return &config;
}
I then have a program (let's call it B) and a dynamic library (let's call it C). Both B and C link with A.lib. At runtime B opens C via dlopen() and then gets an address to C's function func() via a call to dlsym().
void func(void)
{
sys_get_config()->connect_delay = 1000;
}
The above code is the body of C's func() function and it produces a segmentation fault when reached. The segfault only occurs while running outside of gdb.
Why does that happen?
EDIT: Making sys_config_t config a global variable doesn't help.
The solution is trivial. Somehow, by a header mismatch, the PATH_MAX constant was defined differently in B's and C's compilation units. I need to be more careful in the future. (facepalms)
There is no difference between the variable being a static-local, or a static-global variable. A static variable is STATIC, that means, it is not, on function-call demand, allocated on the stack within the current function frame, but rather it is allocated in one of the preexisting segments of the memory defined in the executable's binary headers.
That's what I'm 100% sure. The question, where in what segment they exactly placed, and whether they are properly shared - is an another problem. I've seen similar problems with sharing global/static variables between modules, but usually, the core of the problem was very specific to the exact setup..
Please take into consideration, that the code sample is small, and I worked on that platforms long time ago. What I've written above might got mis-worded or even be plainly wrong at some points!
I think, that the important thing is that you are getting that segfault in C when touching that line. Setting an integer field to a constant could not have failed, never, provided that target address is valid and not write-protected. That leaves two options:
- either your function sys_get_config() has crashed
- or it has returned an invalid pointer.
Since you say that the segfault is raised here, not in sys_get_config, the only thing left is the latter point: broken pointer.
Add to the sys_get_config some trivial printf that will dump the address-to-be-returned, then do the same in the calling function "func". Check whether it not-null, and also check if within sys_get_config it is the same as after being returned, just to be sure that calling conventions are proper, etc. A good idea for making a double/triple check is to also add inside the module "A" a copy of the function sys_get_config (with different name of course), and to check whether the addresses returned from sys_get_config and it's copy are the same. If they are not - something went very wrong during the linking
There is also a very very small chance that the module loading has been deferred, and you are trying to reference a memory of a module that was not fully initialized yet.. I worked on linux very long time ago, but I remember that dlopen has various loading options. But you wrote that you got the address by dlsym, so I suppose the module has loaded since you've got the symbol's final address..
Is there any way to access the command line arguments, without using the argument to main? I need to access it in another function, and I would prefer not passing it in.
I need a solution that only necessarily works on Mac OS and Linux with GCC.
I don't know how to do it on MacOS, but I suspect the trick I will describe here can be ported to MacOS with a bit of cross-reading.
On linux you can use the so called ".init_array" section of the ELF binary, to register a function which gets called during program initilization (before main() is called). This function has the same signature as the normal main() function, execept it returns "void".
Thus, you can use this function to remember or process argc, argv[] and evp[].
Here is some code you can use:
static void my_cool_main(int argc, char* argv[], char* envp[])
{
// your code goes here
}
__attribute__((section(".init_array"))) void (* p_my_cool_main)(int,char*[],char*[]) = &my_cool_main;
PS: This code can also be put in a library, so it should fit your case.
It even works, when your prgram is run with valgrind - valgrind does not fork a new process, and this results in /proc/self/cmdline showing the original valgrind command-line.
PPS: Keep in mind that during this very early program execution many subsystem are not yet fully initialized - I tried libc I/O routines, they seem to work, but don't rely on it - even gloval variables might not yet be constructed, etc...
In Linux, you can open /proc/self/cmdline (assuming that /proc is present) and parse manually (this is only required if you need argc/argv before main() - e.g. in a global constructor - as otherwise it's better to pass them via global vars).
More solutions are available here: http://blog.linuxgamepublishing.com/2009/10/12/argv-and-argc-and-just-how-to-get-them/
Yeah, it's gross and unportable, but if you are solving practical problems you may not care.
You can copy them into global variables if you want.
I do not think you should do it as the C runtime will prepare the arguments and pass it into the main via int argc, char **argv, do not attempt to manipulate the behaviour by hacking it up as it would largely be unportable or possibly undefined behaviour!! Stick to the rules and you will have portability...no other way of doing it other than breaking it...
You can. Most platforms provide global variables __argc and __argv. But again, I support zneak's comment.
P.S. Use boost::program_options to parse them. Please do not do it any other way in C++.
Is there some reason why passing a pointer to space that is already consumed is so bad? You won't be getting any real savings out of eliminating the argument to the function in question and you could set off an interesting display of fireworks. Skirting around main()'s call stack with creative hackery usually ends up in undefined behavior, or reliance on compiler specific behavior. Both are bad for functionality and portability respectively.
Keep in mind the arguments in question are pointers to arguments, they are going to consume space no matter what you do. The convenience of an index of them is as cheap as sizeof(int), I don't see any reason not to use it.
It sounds like you are optimizing rather aggressively and prematurely, or you are stuck with having to add features into code that you really don't want to mess with. In either case, doing things conventionally will save both time and trouble.