What and where exactly is the loader? - c

I understand every bit of the C compilation process (how the object files are linked to create the executable). But about the loader itself (which starts the program running) I have a few doubts.
Is the loader part of the kernel?
How exactly is the ./firefox or some command like that loaded? I mean you normally type such commands into the terminal which loads the executable I presume. So is the loader a component of the shell?
I think I'm also confused about where the terminal/shell fits into all of this and what its role is.

The format of an executable determines how it will be loaded. For example executables with "#!" as the first two characters are loaded by the kernel by executing the named interpreter and feeding the file to it as the first argument. If the executable is formatted as a PE, ELF, or MachO binary then the kernel uses an intrepter for that format that is built in to the kernel in order to find the executable code and data and then choose the next step.
In the case of a dynamically linked ELF, the next step is to execute the dynamic loader (usually ld.so) in order to find the libraries, load them, abd resolve the symbols. This all happens in userspace. The kernel is more or less unaware of dynamic linking, because it all happens in userspace after the kernel has handed control to the interprter named in the ELF file.

The corresponding system call is exec. It is part of the kernel and in charge of cleaning the old address space that makes the call and get a new fresh one with all materials to run a new code. This is part of the kernel because address space is a kind of sandbox that protect processes from others, and since it is critical it is in charge of the kernel.
The shell is just in charge of interpreting what you type and transform it to proper structures (list or arrays of C-strings) to pass to some exec call (after having, most of the time, spawned a new process with fork).

Related

How to run an arbitrary script or executable from memory?

I know I can use a system call like execl("/bin/sh", "-c", some_string, 0) to interpret a "snippet" of shell code using a particular shell/interpreter. But in my case I have an arbitrary string in memory that represents some complete script which needs to be run. That is, the contents of this string/memory buffer could be:
#! /bin/bash
echo "Hello"
Or they might be:
#! /usr/bin/env python
print "Hello from Python"
I suppose in theory the string/buffer could even include a valid binary executable, though that's not a particular priority.
My question is: is there any way to have the system launch a subprocess directly from a buffer of memory I give it, without writing it to a temporary file? Or at least, a way to give the string to a shell and have it route it to the proper interpreter?
It seems that all the system calls I've found expect a path to an existing executable, rather than something low level which takes an executable itself. I do not want to parse the shebang or anything myself.
You haven't specified the operating system, but since #! is specific to Unix, I assume that's what you're talking about.
As far as I know, there's no system call that will load a program from a block of memory rather than a file. The lowest-level system call for loading a program is the execve() function, and it requires a pathname of the file to load from.
My question is: is there any way to have the system launch a
subprocess directly from a buffer of memory I give it, without writing
it to a temporary file? Or at least, a way to give the string to a
shell and have it route it to the proper interpreter?
It seems that all the system calls I've found expect a path to an
existing executable, rather than something low level which takes an
executable itself. I do not want to parse the shebang or anything
myself.
Simple answer: no.
Detailed answer:
execl and shebang convention are POSIXisms, so this answer will focus on POSIX systems. Whether the program you want to execute is a script utilizing the shebang convention or a binary executable, the exec-family functions are the way for a userspace program to cause a different program to run. Other interfaces such as system() and popen() are implemented on top of these.
The exec-family functions all expect to load a process image from a file. Moreover, on success they replace the contents of the process in which they are called, including all memory assigned to it, with the new image.
More generally, substantially all modern operating systems enforce process isolation, and one of the central pillars of process isolation is that no process can access another's memory.

Is the linker retained after a call to exec?

I'm trying to watch the linker load libraries and search for symbols in zygote on Android.
Zygote is started by init (or a start/stop zygote command on the CLI). In the init.*.rc files, I've modified it to have an environment variable of LD_DEBUG=2 (via a setenv LD_DEBUG 2 in the init.zygote[32|64].rc file). In the linker code, this should print debugging information to logcat, which it does if you invoke a program from the command line, ie: "LD_DEBUG=2 ./myprogram".
However, this does not work for zygote. I took a look at /proc/[zygote's pid]/environ, and the LD_DEBUG=2 is definitely there.
The linker code for a service (among other things) sets up the environment array, calls fork(), does a little more work, and then calls exec() with the array it created.
So, I'm wondering, is there any way that the linker would be retained across this fork-exec? I didn't think that was possible, since I thought an exec completely wiped a process's memory space.
I could see how if it is retained, it wouldn't be invoked again, since the linker has already been loaded, but if it the process memory space is being wiped, this doesn't make sense.

Get Linux kernel module ko file name within running module

Is there a simple way to, within a running Linux kernel module, to determine the full file name for the .ko file (ie: /lib/modules/$(uname -r)/kernel/drivers/mymodule.ko) associated with the module, without traversing procfs, but instead, just relying on internal structures/lists available in kernel space code?
You cannot obtain path to the module file within the kernel: the kernel doesn't store it. Moreover, the kernel even doesn't know that path.
There are two syscalls for load a kernel module: init_module and finit_module (both are defined in kernel/module.c). The first one accepts pointer to user space area, where module image resides (user space should read module file into that area before). The second one accepts descriptor to the module's file, but this descriptor is used only for read content of the file, and isn't stored.
No.
First: your module may have been compiled into the kernel, and thus won't have a file path.
Second: Loading kernel modules from files takes place in userspace. The kernel is passed a module as a data buffer, using the init_module system call -- it's theoretically possible that this data was never loaded from a file at all. (For instance, one can imagine a module loader that loads modules from the network, or from a compressed archive.)

Load exe file and call function from them in dos

I have a program(A) and there is anather executable file(B) in the same folder. I must call function from this anther program(B) in my program(A). And all this must be done in dos. How can i do it or what i should read to do this? Please help.
If your two programs are separate executables files then will most likely run in two different processes, You cannot just call functions accross two different processes, you need to use some Inter Process communication mechansim.
You need to start understanding the basics & make a start somewhere and this seems to be a good place to do so.
Since you mention DOS as the target platform, DOS is a non-preempted single user single processing environment but still TSR's in DOS environment emulate the phenomenon of multiprocessing. To implement IPC in DOS you will have to arrange for the TSR to collar a software interrupt, and then communicate with it through that.
MS-Dos is a 16 bit OS. The executables that run in MS-Dos come in two flavours: ".exe" and ".com". Think of the ".com" as a ".exe" with lots of default values assumed by the OS. The ".exe" files contain a header which is read by the OS to determine various parameters. One of these parameters is the entry point address. Only one entry point address is defined (and for a ".com" it is always cs:0x100) and that is the address the OS jumps to when the program has been loaded.
MS-Dos has functions to load another executable and run it, but it can only run from the address given in the header. No other function address is exported so you can't just call some arbitrary function in the other executable. There is no DLL system in MS-Dos.
So, in order to call some arbitrary function in the second executable, you need to create your own DLL style system. This is not trivial since the OS uses a segmented memory model, that is, the memory is divided into 64k pages and addresses are formed from the segment address added to an offset, e.g. segment*16 + offset. So, there are 2^12 ways to express the same physical address. During the loading process, MS-Dos has to fix-up these segment values to reflect the actual location in memory the program has been loaded to. Remember, in MS-Dos there is no virtual memory. If you were to create your own DLL system, you will need to do this fixing-up yourself for code that's bigger than 64k (code+data less than 64k can ignore segments and treat all address as just 16bit offsets).
If you knew the addres, loading the ".exe" using the MS-Dos API would still be tricky as you'd need to know the CS (code segment) address the executable has been loaded to.

How to find signal handlers definitions in Linux kernel?

I am currrently working on "Creation of Postmortem data logger in Linux on Intel architecture".
Its nothing but core utility creation.
Can any body share the details about how the signal handlers for various signals(SIGSEGV,SIGABRT,SIGFPE etc) which produce core dump upon crashing an application internally implemented in Linux kernel. I need to re-write these signal handlers with my own user specific needs and rebuild the kernel. It makes my kernel producing the core file (upon crashing an application) with user specific needs like showing registers,stackdump and backtrace etc.
Can anybody share the details about it....
Advance thanks to all the repliers:)
You may not need to modify the kernel at all - the kernel supports invoking a userspace application when a core dump occurs. From the core(5) man page:
Since kernel 2.6.19, Linux supports an
alternate syntax for the
/proc/sys/kernel/core_pattern file.
If the first character of this file is
a pipe symbol (|), then the
remainder of the line is interpreted
as a program to be executed. Instead
of being written to a disk file, the
core dump is given as standard input
to the program.
The actual dumping code depends on the format of the dump. For ELF format, look at the fs/binfmt_elf.c file. I has an elf_dump_core function. (Same with other formats.)
This is triggered by get_signal_to_deliver in kernel/signal.c, which calls into do_coredump in fs/exec.c, which calls the handler.
LXR, the Linux Cross-Reference, is usually helpful when you want to know how something is done in the Linux kernel. It's a browsing and searching tool for the kernel sources.
Searching “core dump” returns a lot of hits, but two of the most promising-looking are in fs/exec.c and fs/proc/kcore.c (promising because the file names are fairly generic, in particular you don't want to start with architecture-specific stuff). kcore.c is actually for a kernel core dump, but the hit in fs/exec.c is in the function do_coredump, which is the main function for dumping a process's core. From there, you can both read the function to see what it does, and search to see where it's called.
Most of the code in do_coredump is about determining whether to dump core and where the dump should go. What to dump is handled near the end: binfmt->core_dump(&cprm), i.e. this is dependent on the executable format (ELF, a.out, …). So your next search is on the core_dump struct field, specifically its “usage”; then select the hit corresponding to an executable format. ELF is probably the one you want, and so you get to the elf_core_dump function.
That being said, I'm not convinced from your description of your goals that what you want is really to change the core dump format, as opposed to writing a tool that analyses existing dumps.
You may be interested in existing work on analyzing kernel crash dumps. Some of that work is relevant to process dumps as well, for example the gcore extension to include process dumps in kernel crash dumps.

Resources