Load exe file and call function from them in dos - c

I have a program(A) and there is anather executable file(B) in the same folder. I must call function from this anther program(B) in my program(A). And all this must be done in dos. How can i do it or what i should read to do this? Please help.

If your two programs are separate executables files then will most likely run in two different processes, You cannot just call functions accross two different processes, you need to use some Inter Process communication mechansim.
You need to start understanding the basics & make a start somewhere and this seems to be a good place to do so.
Since you mention DOS as the target platform, DOS is a non-preempted single user single processing environment but still TSR's in DOS environment emulate the phenomenon of multiprocessing. To implement IPC in DOS you will have to arrange for the TSR to collar a software interrupt, and then communicate with it through that.

MS-Dos is a 16 bit OS. The executables that run in MS-Dos come in two flavours: ".exe" and ".com". Think of the ".com" as a ".exe" with lots of default values assumed by the OS. The ".exe" files contain a header which is read by the OS to determine various parameters. One of these parameters is the entry point address. Only one entry point address is defined (and for a ".com" it is always cs:0x100) and that is the address the OS jumps to when the program has been loaded.
MS-Dos has functions to load another executable and run it, but it can only run from the address given in the header. No other function address is exported so you can't just call some arbitrary function in the other executable. There is no DLL system in MS-Dos.
So, in order to call some arbitrary function in the second executable, you need to create your own DLL style system. This is not trivial since the OS uses a segmented memory model, that is, the memory is divided into 64k pages and addresses are formed from the segment address added to an offset, e.g. segment*16 + offset. So, there are 2^12 ways to express the same physical address. During the loading process, MS-Dos has to fix-up these segment values to reflect the actual location in memory the program has been loaded to. Remember, in MS-Dos there is no virtual memory. If you were to create your own DLL system, you will need to do this fixing-up yourself for code that's bigger than 64k (code+data less than 64k can ignore segments and treat all address as just 16bit offsets).
If you knew the addres, loading the ".exe" using the MS-Dos API would still be tricky as you'd need to know the CS (code segment) address the executable has been loaded to.

Related

What is the canonical way to execute code directly from a QEMU device?

I'm modeling a particular evaluation board, which has a leon3 processor and several banks of MRAM mapped to specific addresses. My goal is to start qemu-system-sparc using my bootloader ELF, and then jump to the base address of a MRAM bank to begin executing bare-metal programs therein. To this end, I have been able to successfully run my bootloader and jump to the first instruction, but QEMU immediately stops and exits without reporting any error/trap. I can also run the bare-metal programs in isolation by passing them in ELF format as a kernel to qemu-system-sparc.
Short version: Is there a canonical way to set up a device such that code can be executed from it directly? What steps do I need to take when compiling that code to allow it to execute correctly?
I modeled the MRAM as a device with a MemoryRegion, along with the appropriate read and write operations to expose a heap-allocated array with my program. In my board code (modified version of qemu/hw/sparc/leon3.c), writes to the MRAM address are mapped to the MemoryRegion of the device. Using printfs, I am reporting reads and writes in the style of the unimplemented device (qemu/hw/misc/unimp.c), and I have verified that I am reading and writing to the device correctly.
Unfortunately, this did not work with respect to running the code on the device. I can see the read immediately after the bootloader jumps to the base address of my device, but the instruction read doesn't actually do anything. The bootloader uses a void function pointer, which is tied to the address of the MRAM device to induce a jump.
Another approach I tried is creating an alias to my device starting from address 0; I thought perhaps that my binary has all its addresses set relative to zero, so by mapping writes from addresses [0, MRAM_SIZE) as an alias to my device base address, the code will end up reading the corresponding instructions in the device MemoryRegion.
This approach failed an assert in memory.c:
static void memory_region_add_subregion_common(MemoryRegion *mr,
hwaddr offsset,
MemoryRegion *subregion)
{
assert(!subregion->container);
subregion->container = mr;
subregion->addr = offset;
memory_region_update_container_subregions(subregion);
}
What do I need to do to coerce QEMU to execute the code in my MRAM device? Do I need to produce a binary with absolute addresses?
Older versions of QEMU were simply unable to handle execution from anything other than RAM or ROM, and attempting to do so would give a "qemu: fatal: Trying to execute code outside RAM or ROM" error. QEMU 3.1 and later fixed this limitation, and now can execute code from anywhere -- though execution from a device will be much much slower than executing from RAM.
You mention that you "modeled the MRAM as a device with a MemoryRegion, along with the appropriate read and write operations to expose a heap-allocated array". This sounds like it is probably the wrong approach -- it will work but be very slow. If the MRAM appears to the guest as being like RAM, then model it as RAM (ie with a RAM MemoryRegion). If it's like RAM for reading but writes need to do something other than just-write-to-the-memory (or need to do that some of the time), then model it using a "romd" region, the same way the existing pflash devices do. Nonetheless, modelling it as a device with pure read and write functions should work, it'll just be horribly slow.
The assertion you've run into is the one that says "you can't put a memory region into two things at once" -- the 'subregion' you've passed in is already being used somewhere else, but you've tried to put it into a second container. If you have a MemoryRegion that you need to have appear in two places in the physical memory map, then you need to: create the MemoryRegion; create an alias MemoryRegion that aliases the real one; map the actual MemoryRegion into one place; map the alias into the other. There are plenty of examples of this in existing board models in QEMU.
More generally, you need to figure out what the evaluation board hardware actually is, and then model that. If the eval board has the MRAM visible at multiple physical addresses, then yes, use an alias MR. If it doesn't, then the problem is somewhere else and you need to figure out what's actually happening, not try to bodge around it with aliases that don't exist on the real hardware. QEMU's debug logging (various -d suboptions, plus -D file to log to a file) can be useful for checking what the emulated CPU is really doing in this early bootup phase -- but watch out as the logs can be quite large and they are sometimes tricky to interpret unless you know a little about QEMU internals.

Is there a way to "test" the use of "volatile" keyword in C on a Desktop computer running Linux or Windows?

I know that 'volatile' keyword in C is used to tell the compiler to NOT load the variable from RAM memory into a register or into cache and to ALWAYS read the variable from the computer working memory.
However I also read that the use case is when another device is modifying the value at the memory address stored in the variable.
My question is:
Is there any possibility to modify the value of a memory address while a program is running on a Linux or a Windows machine that also has a MMU and uses virtual address space for its programs (like all modern machines)?
Is it possible to change a variable of a program from another program (running in a different process not only a different thread) ?
Is there any possibility to modify the value of a memory address while a program is running on a Linux or a Windows machine that also has a MMU and uses virtual address space for its programs (like all modern machines)?
Yes, of course!
The obvious example is threading: another thread could be updating the memory you're looking at, so you don't want to assume it never changes.
Other examples include:
Shared memory. Processes can agree to share a piece of memory for efficient IPC.
mmap. A program can map a file into memory. When the file changes, the corresponding memory also changes (on Linux, this is the basis of shared memory).
DMA. Other devices, like hard drives, can be asked to write data directly to RAM for efficient transfers.
Is it possible to change a variable of a program from another program (running in a different process not only a different thread) ?
Yes. If the processes agree, you can use shared memory.
If they don't, one can attach itself to another as a debugger and inspect/modify its memory.
There are a couple of questions here, so I'll tackle the first. "...test the use of volatile keyword..."
I suppose you could compile the module without the volatile keyword down to assembler (I believe it is the -S option). The method could be repeated for the same code modified with the volatile keyword, then a diff tool could put a spotlight on the changes. I would suspect the loading of the variable in question with the volatile keyword would always be directly from the location of it.
This could also be verified by looking at the .map listing to know in advance what the actual location of the volatile variable is so you have a basis for comparison.

What and where exactly is the loader?

I understand every bit of the C compilation process (how the object files are linked to create the executable). But about the loader itself (which starts the program running) I have a few doubts.
Is the loader part of the kernel?
How exactly is the ./firefox or some command like that loaded? I mean you normally type such commands into the terminal which loads the executable I presume. So is the loader a component of the shell?
I think I'm also confused about where the terminal/shell fits into all of this and what its role is.
The format of an executable determines how it will be loaded. For example executables with "#!" as the first two characters are loaded by the kernel by executing the named interpreter and feeding the file to it as the first argument. If the executable is formatted as a PE, ELF, or MachO binary then the kernel uses an intrepter for that format that is built in to the kernel in order to find the executable code and data and then choose the next step.
In the case of a dynamically linked ELF, the next step is to execute the dynamic loader (usually ld.so) in order to find the libraries, load them, abd resolve the symbols. This all happens in userspace. The kernel is more or less unaware of dynamic linking, because it all happens in userspace after the kernel has handed control to the interprter named in the ELF file.
The corresponding system call is exec. It is part of the kernel and in charge of cleaning the old address space that makes the call and get a new fresh one with all materials to run a new code. This is part of the kernel because address space is a kind of sandbox that protect processes from others, and since it is critical it is in charge of the kernel.
The shell is just in charge of interpreting what you type and transform it to proper structures (list or arrays of C-strings) to pass to some exec call (after having, most of the time, spawned a new process with fork).

How to get address information from library to be shared among all processes?

In Understanding the Linux Kernel, 3rd edition, it says:
Shared libraries are especially convenient on systems that provide file memory mapping, because they reduce the amount of main memory requested for executing a
program. When the dynamic linker must link a shared library to a process, it does not copy the object code, but performs only a memory mapping of the relevant portion of the library file into the process’s address space. This allows the page frames containing the machine code of the library to be shared among all processes that are using the same code. Clearly, sharing is not possible if the program has been linked statically. (page 817)
I am interested in this, want to write a small program in C to verify, given two pids as input such as two gedit processes, and then get the address information from page frames to be shared. Does anyone know how to do it? From that book, I think the bss segment and text segment address from two or more gedit processes are same, is that correct?
It is not the text and bss sections of your gedit (or whatever) that have the same address, but the content of the libc.so shared library - and all other shared libraries used by the two gedit processes.
This, as the quoted text says, allows the shared library to be ONE copy, and this is the main benefit of the shared library in general.
bss is generally not shared - since that is per process data. text sections of two processes running the same executable, in Linux, will share the same code.
Unfortunately, the proof of this would be to look at the physical mapping of pages (page at address X in process A is at physical address Y, and page for address X in process B is also at physical address Y) within the processes, and that's, as far as I know, not easily available without groking about inside the OS kernel.
Look at the contents of /proc/*/maps.

When a binary file runs, does it copy its entire binary data into memory at once? Could I change that?

Does it copy the entire binary to the memory before it executes? I am interested in this question and want to change it into some other way. I mean, if the binary is 100M big (seems impossible), I could run it while I am copying it into the memory. Could that be possible?
Or could you tell me how to see the way it runs? Which tools do I need?
The theoretical model for an application-level programmer makes it appear that this is so. In point of fact, the normal startup process (at least in Linux 1.x, I believe 2.x and 3.x are optimized but similar) is:
The kernel creates a process context (more-or-less, virtual machine)
Into that process context, it defines a virtual memory mapping that maps
from RAM addresses to the start of your executable file
Assuming that you're dynamically linked (the default/usual), the ld.so program
(e.g. /lib/ld-linux.so.2) defined in your program's headers sets up memory mapping for shared libraries
The kernel does a jmp into the startup routine of your program (for a C program, that's
something like crtprec80, which calls main). Since it has only set up the mapping, and not actually loaded any pages(*), this causes a Page Fault from the CPU's Memory Management Unit, which is an interrupt (exception, signal) to the kernel.
The kernel's Page Fault handler loads some section of your program, including the part
that caused the page fault, into RAM.
As your program runs, if it accesses a virtual address that doesn't have RAM backing
it up right now, Page Faults will occur and cause the kernel to suspend the program
briefly, load the page from disc, and then return control to the program. This all
happens "between instructions" and is normally undetectable.
As you use malloc/new, the kernel creates read-write pages of RAM (without disc backing files) and adds them to your virtual address space.
If you throw a Page Fault by trying to access a memory location that isn't set up in the virtual memory mappings, you get a Segmentation Violation Signal (SIGSEGV), which is normally fatal.
As the system runs out of physical RAM, pages of RAM get removed; if they are read-only copies of something already on disc (like an executable, or a shared object file), they just get de-allocated and are reloaded from their source; if they're read-write (like memory you "created" using malloc), they get written out to the ( page file = swap file = swap partition = on-disc virtual memory ). Accessing these "freed" pages causes another Page Fault, and they're re-loaded.
Generally, though, until your process is bigger than available RAM — and data is almost always significantly larger than the executable — you can safely pretend that you're alone in the world and none of this demand paging stuff is happening.
So: effectively, the kernel already is running your program while it's being loaded (and might never even load some pages, if you never jump into that code / refer to that data).
If your startup is particularly sluggish, you could look at the prelink system to optimize shared library loads. This reduces the amount of work that ld.so has to do at startup (between the exec of your program and main getting called, as well as when you first call library routines).
Sometimes, linking statically can improve performance of a program, but at a major expense of RAM — since your libraries aren't shared, you're duplicating "your libc" in addition to the shared libc that every other program is using, for example. That's generally only useful in embedded systems where your program is running more-or-less alone on the machine.
(*) In point of fact, the kernel is a bit smarter, and will generally preload some pages
to reduce the number of page faults, but the theory is the same, regardless of the
optimizations
No, it only loads the necessary pages into memory. This is demand paging.
I don't know of a tool which can really show that in real time, but you can have a look at /proc/xxx/maps, where xxx is the PID of your process.
While you ask a valid question, I don't think it's something you need to worry about. First off, a binary of 100M is not impossible. Second, the system loader will load the pages it needs from the ELF (Executable and Linkable Format) into memory, and perform various relocations, etc. that will make it work, if necessary. It will also load all of its requisite shared library dependencies in the same way. However, this is not an incredibly time-consuming process, and one that doesn't really need to be optimized. Arguably, any "optimization" would have a significant overhead to make sure it's not trying to use something that hasn't been loaded in its due course, and would possibly be less efficient.
If you're curious what gets mapped, as fge says, you can check /proc/pid/maps. If you'd like to see how a program loads, you can try running a program with strace, like:
strace ls
It's quite verbose, but it should give you some idea of the mmap() calls, etc.

Resources