How can i execute an executable from memory? - c

Let's say I have included a binary into my program during compilation so, I keep it in a variable something like
var myExec =[]byte{'s','o','m','e',' ','b','y','t','e','s'}
So my question is whether there is a way to execute this binary within my program without writing it back to the disc and calling exec or fork on it?
I am writing my app in Golang so the method I am seeking for is to do it using Go or C (using CGO).
Basically, I am seeking something like piping the bash script into bash just I don't know where can I pipe the bytes of a native executable to run it and writing it back to disk and then letting os to read it again seems a lot of extra work to be done

In C and assuming Linux, you can change the protection of a memory region by means of the mprotect() system call, so that it can be executed (i.e.: turn a data region into a code region). After that, you could execute that region of memory by jumping into it.

Related

Is it possible to "create" garbage memory data in one C program, and then later read in that same data in from another C program?

I would like to know if it is possible to in the first C program:
Allocate and declare an int to the value of 5 in memory
Print out the address of the variable (eg: 0x7ffee6a98ad8)
Terminate
And then in a second C program, after the first has completely finished executing:
Read in the data that was previously declared at address 0x7ffee6a98ad8
Print the correct value of 5
Is this a possibility in C?
If so, how would one go about accomplishing such a task?
I am using Linux, if that matters.
It once was possible. If you made a large C program in DOS and alllocated some RAM with malloc() you could in fact save that address somewhere (like on disk) and start another C program and read that memory.
I hear it's still possible on deeply embedded platforms, but on modern multi-user operating systems, when you allocate RAM from the OS it clears the RAM first so you can't see it.
Question edited to say Linux. Well no, but also yes. Open up the shell process with ptrace(), allocate some memory in the shell process and write to it, and the next program can find it there. This is nothing like wild pointer games, and is really quite tricky. https://techryptic.github.io/2018/04/07/Using-PTRACE-to-Inspect-&-Alter-Memory/ The window is closing; they're starting to tighten things so you can't debug any processes but your own child processes because they don't want a sudo disaster.

How to run an arbitrary script or executable from memory?

I know I can use a system call like execl("/bin/sh", "-c", some_string, 0) to interpret a "snippet" of shell code using a particular shell/interpreter. But in my case I have an arbitrary string in memory that represents some complete script which needs to be run. That is, the contents of this string/memory buffer could be:
#! /bin/bash
echo "Hello"
Or they might be:
#! /usr/bin/env python
print "Hello from Python"
I suppose in theory the string/buffer could even include a valid binary executable, though that's not a particular priority.
My question is: is there any way to have the system launch a subprocess directly from a buffer of memory I give it, without writing it to a temporary file? Or at least, a way to give the string to a shell and have it route it to the proper interpreter?
It seems that all the system calls I've found expect a path to an existing executable, rather than something low level which takes an executable itself. I do not want to parse the shebang or anything myself.
You haven't specified the operating system, but since #! is specific to Unix, I assume that's what you're talking about.
As far as I know, there's no system call that will load a program from a block of memory rather than a file. The lowest-level system call for loading a program is the execve() function, and it requires a pathname of the file to load from.
My question is: is there any way to have the system launch a
subprocess directly from a buffer of memory I give it, without writing
it to a temporary file? Or at least, a way to give the string to a
shell and have it route it to the proper interpreter?
It seems that all the system calls I've found expect a path to an
existing executable, rather than something low level which takes an
executable itself. I do not want to parse the shebang or anything
myself.
Simple answer: no.
Detailed answer:
execl and shebang convention are POSIXisms, so this answer will focus on POSIX systems. Whether the program you want to execute is a script utilizing the shebang convention or a binary executable, the exec-family functions are the way for a userspace program to cause a different program to run. Other interfaces such as system() and popen() are implemented on top of these.
The exec-family functions all expect to load a process image from a file. Moreover, on success they replace the contents of the process in which they are called, including all memory assigned to it, with the new image.
More generally, substantially all modern operating systems enforce process isolation, and one of the central pillars of process isolation is that no process can access another's memory.

Reusing/Saving read in arrays of data between runs in C

I am programming in C and using bash as my shell. Currently I am trying to optimize when I run my program. The general gist of the program is to input some parameters, read in a data file and then the program runs some calculations based on the input parameters and data from the file. I often run this code 100's of times at a time by only changing the input parameters for each run, and not the data from the file. I do this using a shell script to xargs the executable with various parameters.
printf "%s\n" {0..n} | xargs -P 8 -n 1 ./program
The problem is I have a very large data file which takes about >1 seconds to read in. This is done at every call to the executable, however, often the data which is read in does not change! Therefore I believe I could save a lot of time by saving this data somehow so that other calls of the executable can use the data that has already been read in, instead of wasting time reading in the data themselves.
I was thinking maybe there could be another program that reads in the data, then protects and sends the address of the data to my current executables. After the executables have finished running in full, this is relayed back to the new program which then releases the data. Is this possible? Or is there another way which would be superior?
You could try storing the file in shared memory : /dev/shm
Ex: ls > /dev/shm/ls-output
./program < /dev/shm/ls-output
Not sure if this is what you were looking for, but something along this line might be helpful I guess.
I do not know the answer to your question, but it would solve your problem if you could only coax the filesystem to keep the file in a shared memory segment, then to publish the pointer and segment identifier needed to access the file in memory. See this kernel document, and this one, too. They may help.
Also, once you have solved your problem, please post your solution as an answer, and also comment below my answer so that Stack Overflow flags me to come back here and to look again.

Why isn't extending TCL example not working?

Link to the example... on wiki.tcl.tk
There is an example here for extending tcl through the use of an executable module that communicates through pipes. (Located in the section marked Executable Modules (EM))
I have compiled the program using Ubuntu and Windows XP. When I try to run the script that tests the modules - they both hang.
Whats missing from the example?
Looks like the example is missing out handling of flushing of the output side of the pipes. The data's being buffered up in OS buffers (waiting for a few kilobytes to be built up) instead of actually being sent immediately to the other process. Note that this buffering only happens when the output is being directed to something other than a terminal, so you won't see it when interactively testing. (Its also not important if lots of data is being written, when the improved efficiency of having that buffering is a winner.)
On the C side, add this line at the top of the main function:
setvbuf(stdout, NULL, _IONBF, 0);
On the Tcl side, add this immediately after the starting of the subprogram:
fconfigure $mathModule -buffering none
The C side can be also done by using fflush after every printf. If you're stuck with a real C program that you don't have access to the source of, you can still make progress by wrapping the the whole program with the unbuffer program (actually a Tcl script that uses magic with Expect to make the subprocess think it's talking to a terminal). The one down-side of unbuffer is that it uses a virtual terminal, which comes from a far more restricted pool of resources than plain old process IDs (let alone pipes/file descriptors).
I'm having success using Expect to work with the example C, its not hanging. Another thing to learn but it gets the job done. Also I'm learning flex/bison to replace the C code in the example.

popen performance in C

I'm designing a program I plan to implement in C and I have a question about the best way (in terms of performance) to call external programs. The user is going to provide my program with a filename, and then my program is going to run another program with that file as input. My program is then going to process the output of the other program.
My typical approach would be to redirect the other program's output to a file and then have my program read that file when it's done. However, I understand I/O operations are quite expensive and I would like to make this program as efficient as possible.
I did a little bit of looking and I found the popen command for running system commands and grabbing the output. How does the performance of this approach compare to the performance of the approach I just described? Does popen simply write the external program's output to a temporary file, or does it keep the program output in memory?
Alternatively, is there another way to do this that will give better performance?
On Unix systems, popen will pass data through an in-memory pipe. Assuming the data isn't swapped out, it won't hit disk. This should give you just about as good performance as you can get without modifying the program being invoked.
popen does pretty much what you are asking for: it does the pipe-fork-exec idiom and gives you a file pointer that you can read and write from.
However, there is a limitation on the size of the pipe buffer (~4K iirc), and if you arent reading quickly enough, the other process could block.
Do you have access to shared memory as a mount point? [on linux systems there is a /dev/shm mountpoint]
1) popen keep the program output in memory. It actually uses pipes to transfer data between the processes.
2) popen looks IMHO as the best option for performance.
It also have an advantage over files of reducing latency. I.e. your program will be able to get the other program output on the fly, while it is produced. If this output is large, then you don't have to wait until the other program is finished to start processing its output.
The problem with having your subcommand redirect to a file is that it's potentially insecure while popen communication can't be intercepted by another process. Plus you need to make sure the filename is unique if you're running several instances of your master program (and thus of your subcommand). The popen solution doesn't suffer from this.
The performance of popen is just fine as long as your don't read/write one byte chunks. Always read/write multiples of 512 (like 4096). But that does apply to file operations as well. popen connects your process and the child process through pipes, so if you don't read then the pipe fills up and the child can't write and vice versa. So all the exchanged data is in memory, but it's only small amounts.
(Assuming Unix or Linux)
Writing to the temp file may be slow if the file is on a slow disk. It also means the entire output will have to fit on the disk.
popen connects to the other program using a pipe, which means that output will be sent to your program incrementally. As it is generated, it is copied to your program chunk-by-chunk.

Resources