C - array address used to execute code - c

If the following array contained shell code in a C program on a LINUX machine
char buf [100]
then how does the following execute this shell code :
((void(*)())buf)()

Simple. It casts buf to a pointer-to-function taking no arguments and returning void, and then invokes that function.
However, that probably won't work since the page containing buf is highly unlikely to be marked as executable.

Related

How does C parse command line arguments?

I'm curious as to what C does exactly to parse command line arguments. For example, assume I have a program named myProgram that takes in two arguments like this
./myProgram arg1 arg2
If I were to call
./myProgram arg1$'\0otherstuff' arg2
arg1 and arg2 would still print if we were to print argv[1] and argv[2], ignoring $'\0otherstuff', but where does it go? Is it store in memory behind arg1? Could it potentially overwrite any buffer? How is arg2 read if there's a null character before it?
Converting ./myProgram arg1 arg2 into a C style int argc, char *argv[] is done by the operating system or by shell (it depends). C does not parse the arguments, you parse the arguments in C. C is a programming language, not entity. The form int argc, char *argc[] is used in the C programming language as the arguments passed to the main function, but other programming languages may use a different form, for C see main_function.
In linux, one may use execve system call to specify arguments passed to a function. Parsing from the form ./myProgram arg1 arg2 to execve arguments is done by the shell (e.g. bash), which constructs argv array and passes arguments to execve call.
Your shell is probably ignoring the part $'\0otherstuff', because under POSIX flename cannot contain the NUL character (assuming your shell is POSIX compatible).
When calling an executable, your OS kernel will take the additional arguments (as plain text) and pass them into the program memory.
Before the main function is called, a small code is executed, which passes the given arguments to the actual main function in C.
Experimenting with bash (version 3.2.57(1)-release (x86_64-apple-darwin17)) suggests that the “otherstuff” in your example is not passed to the program. When a program is called with the command line you show, the memory pointed to by argv[1] contains “arg1”, then a null character, then “arg2”. Thus, the null and “otherstuff” in your command line has not been passed to the program.
(Hypothetically: If the shell were to pass it to the program, I would expect it would pass it in the memory continuing from that pointed to by argv[1], and there would be no danger of it overwriting any buffer. If the shell were designed to tolerate an embedded null character in an argument, I expect (based on how we design things) that it would treat the argument as a complete string and provide the necessary space to hold it.)
The fact that the argument prior to “arg2” contains a null character is irrelevant to the handling of “arg2”. After initial processing of the command line, the shell does not treat the line as one string. It has divided it into words or other units and handles them with its own data structures. So the presence of null characters in prior arguments has no effect on later arguments.
Additionally, it may not be possible for the shell to pass an argument containing an embedded null character. The routines typically used to execute a program, such as execl, accept the arguments as null-terminated strings. So the embedded null terminates the string, and the execl routine never passes anything beyond the null character.

Will the command line arguments be "passed twice"? [duplicate]

This question already has answers here:
How do command line arguments work?
(3 answers)
Closed 5 years ago.
I am trying to understand how command line arguments work in details.
This is what I think happens:
When you compile a source code that contains the main() function in C, the generated object file will be linked with the CRT, and the entry point for the program will be the _start() function (which exists in the CRT), and _start() will call main().
Now when you run your program and pass it some command line arguments, the command line arguments will be passed to the _start() function, and then _start() will re-pass the command line arguments to main().
Am I correct?
Am I correct?
Yes and no:
The _start() function is not a C function but an assembler function. The reason for this is that the CPU is not in a "state" which is required by C programs so the _start() function also has to set up the CPU for executing C code.
One difference between the "state" required by C programs and the "state" of the CPU when _start() is called is the way arguments (here: command line arguments) are stored.
Under Linux (at least 32 bit - I don't know about 64 bit) you actually have an array that later represents argv. _start() has to calculate the location of argv and then pass the calculated value to main().
Under Windows there is a function that returns the entire command line as pointer to a single string (const char *)! The _start() function has to call that function and then to split the string into parts that will later become argv...

Is it safe to use the argv pointer globally?

Is it safe to use the argv pointer globally? Or is there a circumstance where it may become invalid?
i.e: Is this code safe?
char **largs;
void function_1()
{
printf("Argument 1: %s\r\n",largs[1]);
}
int main(int argc,char **argv)
{
largs = argv;
function_1();
return 1;
}
Yes, it is safe to use argv globally; you can use it as you would use any char** in your program. The C99 standard even specifies this:
The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.
The C++ standard does not have a similar paragraph, but the same is implicit with no rule to the contrary.
Note that C++ and C are different languages and you should just choose one to ask your question about.
It should be safe so long as main() function does not exit. A few examples of things that can happen after main() exits are:
Destructors of global and static variables
Threads running longer than main()
Stored argv must not be used in those.
The reference doesn't say anything which would give a reason to assume that the lifetimes of the arguments to main() function differ from the general rules for lifetimes of function arguments.
So long as argv pointer itself is valid, the C/C++ runtime must guarantee that the content to which this pointer points is valid (of course, unless something corrupts memory). So it must be safe to use the pointer and the content that long. After main() returns, there is no reason for the C/C++ runtime to keep the content valid either. So the above reasoning applies to both the pointer and the content it points to.
is it safe to use the argv pointer globally
This requires a little more clarification. As the C11 spec says in chapter §5.1.2.2.1, Program startup
[..].. with two parameters (referred to here as argc and argv, though any names may be used, as they are local to the function in which they are declared)
That means, the variables themselves have a scope limited to main(). They are not global themselves.
Again the standard says,
The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.
That means, the lifetime of these variables are till main() finishes execution.
So, if you're using a global variable to hold the value from main(), you can safely use those globals to access the same in any other function(s).
This thread on the comp.lang.c.moderated newsgroup discusses the issue at length from a C standard point of view, including a citation showing that the contents of the argv arrays (rather than the argv pointer itself, if e.g. you took an address &argv and stored that) last until "program termination", and an assertion that it is "obvious" that program termination has not yet occurred in a way relevant to this while the atexit-registered functions are executing:
The program has not terminated during atexit-registered
function processing. We thought that was pretty obvious.
(I'm not sure who Douglas A. Gwyn is, but it sounds like "we" means the C standard committee?)
The context of the discussion was mainly concerning storing a copy of the pointer argv[0] (program name).
The relevant C standard text is 5.1.2.2.1:
The parameters argc and argv and the strings pointed to by the
argv array shall be modifiable by the program, and retain their
last-stored values between program startup and program
termination.
Of course, C++ is not C, and its standard may subtly differ on this issue or not address it.
You can either pass them as parameters, or store them in global variables. As long as you don't return from main and try to process them in an atexit handler or the destructor of an variable at global scope, they still exist and will be fine to access from any scope.
yes, it is safe for ether C or C++, because there no thread after main was finish.

Executing void(char*) function as a void(void)

void print( char * str ){ printf("%s",str); }
void some_function(){
... Loads a file into `char * buffer` ...
... Stores the function print before `buffer` address ...
((void(*)(void))buffer();
}
In the file, there is a "Hello World" in it, and some unreadable characters. Executing the buffer will print "Hello World".
I know you can execute a pointer like this:
void (*foo)(int) = &bar; // such that void bar(int)
(*foo)(123);
But executing void(int) function as a void(void) function with its function and parameters in memory is new to me.
Is there a standard of how a function look like in memory (like a string is terminated by a null character) such that you can execute it in this way?
But executing void(int) function as a void(void) function with its
function and parameters in memory is new to me.
Good. It's undefined behavior and you should never do it. It might happen to work on some implementation, but there's no guaranteed result.
Is there a standard of how a function look like in memory (like a
string is terminated by a null character) such that you can execute it
in this way?
No, there isn't. More relevantly, there is no standard for how arguments are passed to functions -- it's up to the implementation, and there is no requirement that it be implemented in such a way that functions with different types are compatible.
But in this case, the programmer is taking advantage of knowledge of what a function looks like in memory, and has prepared the file in such a way that it does the right thing when loaded into memory and run in this particular implementation. This is, after all, what loaders do ... they read programs from files into memory and then execute them. This method is also used by exploits that take advantage of bugs in programs to load nefarious code into the program's memory and get the program to execute it.

Can I use C headers in Delphi?

Is it possible to compile C headers with the Rad Studio XE C++ compiler and link them with Delphi code?
Thus eliminating the need to convert header files to pascal ?
The reason for the question is..
C - Header definition:
DLLEXPORT int url_engine_version(char *version, size_t length);
Attempt at Delphi Definition
function url_engine_version(version: PByte; var length: cardinal): integer;
cdecl; external 'corplib.dll';
Main app tried to call it using:
engVer: Pointer;
engLen: cardinal;
engLen := 64;
GetMem(engVer,engLen);
url_engine_version(engVer,engLen);
But Delphi AV's when it tries to call the routine.
Working C# definition - Chick works if I pass a StringBuilder predefinded as length 64
[DllImport("corplib.dll", CharSet = CharSet.Ansi,
CallingConvention = CallingConvention.Cdecl)]
public static extern int url_engine_version(StringBuilder version, [Out] int length);
The answer to your actual question is: No. Sorry.
However, I think in this case the problem is quite simple...
Either the C Header you have reproduced here is wrong or the C# declaration is or you have just been very lucky to not have the C# code crash and burn as badly as the Delphi code.
The problem I think is that the C Header declares the length parameter as a SIZE_T, NOT a pointer to a SIZE_T. i.e. it is an input parameter, not an output or in/out parameter.
You presumably use length to specify the size of buffer allocated for the pointer you pass in version. I further presume that the function returns the number of actual bytes used for the data placed in the version buffer.
The Delphi version crashes, I believe, because by specifying length as var, you are passing length by reference, i.e. the function receives not "64" but a pointer to the value "64", but it is using this pointer value, not the 64 value.
The C# code may be dodging the bullet by (also incorrectly, if the C Header itself is correct) declaring the parameter as out. This may translate into something that, if not correct, is at least not as "damagingly incorrect" at runtime.
I think simply removing the "var" from the length parameter declaration should solve your problem:
function url_engine_version(aVersion: PByte; aLength: cardinal): integer; cdecl; external 'corplib.dll';
The main problem with converting are differences in the type system. A good delphi header can't be derived from c.
For example c doesn't distinguish pointers to one element and pointers to arrays. It doesn't distinguish bools and ints. A char* can mean a zero terminated string, a pointer to bytes, a pointer to a single char, a char passed by reference,...
And in your example the delphi code passes the last parameter by reference(i.e. as pointer to UInt32) and the c code doesn't. But I don't understand why the C# code works.
Project JEDI have done a lot of C header conversions for Delphi. They have an excellent set of resources, tutorials etc. on their website.
They also have a tool that can automate this which was actually derived from original code by Bob Swart.

Resources