I am writing a monolithic OS(It is a joke to call it an OS but it does have very minimal, school level functionalists).
When I say monolithic, I meant, it is compiled as a single binary blob and no support for file system etc. Currently I just have a rudimentary simple user space which is nothing but infinite while loop.
I am planning to make my OS little more useful and want to able to write user apps which can terminate like regular apps on a full blown OS.
I don't have glibc or equivalent. My current library in the user space is code which I have written.
Now my problem is how to add a framework for user space apps, which would let them terminate in a fix point.
I do know how programs get compiled on regular systems and what happens when a program terminates. However, in my case I don't have luxury to compile programs against libraries and if a program terminates then my instruction pointer just goes on a wild detour.
Currently I am making all apps to make to do a "return call" and I am pre-populating the app stack with a fix address(during launch). Is there any better way to handle the problem?
Along with the answer, I would be more than happy to get clarity on some of OS concepts.
I am working on x86 emulator-platform and compiling my binary statically. (I do have virtual memory support)
Hand crafting the first stack frame with a return into whatever process cleanup code you need to run seems like a perfectly reasonable method. If your OS has "syscalls" then user-space process cleanup code (maybe called exit()) probably ends with a call to the _exit() syscall. You still need to handle the case where the program tries to execute code in ''la-la land'' because that can still happen (however doing that before you have a page-protection system might be a hard problem).
Related
I am curious about how statically linked C executables would work in different environments. Lets say we compile our C code to target x86 MacOs and we statically include everything it uses in the executable as well (print, strlen). What really stops this executable from running in a Windows OS if we include every library it needs? I understand the file format could be different and break but other than that would this technically be able to run?
I see where you're coming from, operating systems makes us think as programmers that libraries are the be-all end-all of programming, that a call to a library is all you need to make complex things happen and that everything is contained in them.
But the truth is, libraries mostly provide provide an abstraction layer. As an exemple let's create a library called "hello_world.so" which prints "Hello World!" to the console. That library we created relies on stdio to handle the complex I/O stuff but stdio itself depends on at least one other thing: the kernel (some specific targets work without a kernel but these system are outside the scope of this answer).
In the desktop world, things can get really complicated, we have several hundreds of processes all running at once even in an idle system, all these apps need access to the hardware (possibly at once too) so it was decided a controller was needed, some piece of software that would coordinate all other software running on the same computer. This piece of software is usually called a kernel. On Windows it's the NT kernel, on macOS it's the XNU and on Linux it's... the Linux kernel!
On these systems, the biggest job of a library is to abstract the kernel, to make us believe printing text on a Linux or a Windows console works the exact same way when actually it can be completely different! Libraries like stdio/time/etc have different "implementations" but the same "interface": they look the same from the dev point of view but the way they achieve their goals can vary wildy, they can do conversions, calls to other hidden or non hidden functions... All this is completely portable from one OS to the other though, things start to go south for you idea when kernel calls start to show up.
Kernel calls are ways a program can "talk" to the kernel. They can be used to do literally thousands of different things but for example there's one (or several ones) to ask for memory (usually this is called with malloc), one to print to the console, one to ask if a network packet arrived, on to ask to talk to your GPU... And these system calls are completely different from one kernel to the other, sometimes even for two versions of the same kernel!
These "kernel calls" are the only thing preventing you from running statically-compiled linux programs on Windows.
PS: Even though all the above is completely true and kernels can be as different from one another as they wish, due to the history of kernels and of computing in general, some kernels actually share the same interface (even though their implementation as you guessed, can be nothing alike). The best example I know of is how most kernels I know of are based on the UNIX kernel.
It means that -even though I have never tested it myself- I think you should be able to statically link a Linux app and use it on Linux, most BSDs and possibly even macOS
The binary and libraries are specific to the operating system.
The TLDR is that the linking process translates function calls into adresses that points of the Operating System's specific libraries. There are some differences like alignment that happens at compile time, but the responsible for your x86 instructions not running under a different OS is the linker.
Your compiler produces x86 instructions that are ready to execute but is incomplete. The linker will go into every function call give a adress for that function in the executable file, even for functions of the standard library.
The linker will create the executable file following a file format which have a header with information like size, metadata and entry point.Windows and Unix have differents specifications for executable files. Windows has PE and Unix has ELF format both for executables and libraries.
Through some hacks and non-trivial tricks it is possible to create an executable that can run on Windows and Unix (see αcτµαlly pδrταblε εxεcµταblε for how).
But even if you do all that there is an obstacle that can not be circumvented: the kernel. The kernel is, well, a kernel. It's the most important thing in
a OS and it provides a set of API calls that provides basic and low-level access to computer resources, so functions like malloc are implemented using the kernel specific API call, VirtualAlloc for Windows and vmalloc or mmap for POSIX.
Main Answer
If your program does anything useful (print output, return a value, communicate on the network), it contains some form of system-call instruction. Each system-call instruction is a request to a particular operating system, and macOS system calls will not work on Windows and vice-versa.
The system-call instruction sends information to the operating system, including a number identifying which service is requested. The operating system that performs that service. When you build your program for macOS, it includes library routines that contain system-call instructions. If you execute those instructions on a Microsoft Windows system, Windows will not understand the macOS requests. It will interpret the information differently, and the program will not work.
So, in theory, there is nothing preventing you from writing your own program loader that reads an ELF executable file intended for macOS, loading its contents into memory, and transferring control to its entry point. But the program will not work because of the system calls.
Supplement
You might consider translating all the system calls in the program. Changing the primary number that identifies the service request might be feasible; it might not require changing the executable too much. For example, if 37 is a “write to file” request on macOS, your program loader might change it to 48 on Windows. However, the system calls also require other data be passed, such as pointers to buffers, lengths, and so on, and there are likely many discrepancies in how those are passed, so that macOS requests cannot be easily translated into Windows requests. Also, it can be technically challenging to identify all places in a program that a certain instruction is used—some of the contents of memory of a loaded program are instructions and some are data. Most normal programs may be well-behaved and easy to analyze in this regard, but not all are.
Another potential issue is that programs may expect to have certain modes set in the processor, and the host operating system may or may not have set those modes as needed.
This question already has an answer here:
Should you free at the end of a C program [duplicate]
(1 answer)
Closed 3 years ago.
I am creating a console program, which will have some resources like some threads and some sockets.
When the user closes the console program, should I detect this closing event and free those resources, or can I just let the OS handle this?
And do well known console programs (for example: ls, cat, grep in Linux) free their resources when they exit?
My question is not about a single OS (my console program will run on Windows and Linux and macOS).
When the user closes the console program, should I detect this closing event and free those resources, or can I just let the OS handle this?
Good code is re-used. What today is "closes the console program", tomorrow could be "return from a function" called Christopher_console program().
Plan for re-use and close/free allocated resources.
Both other answers (so Luke's one and chux' one) make sense. It is a matter of perspective.
But cleaning up your mess makes debugging easier with valgrind.
If your program is serious enough to need a lot of debugging, you may want to facilitate that. If you choose to avoid cleanup for performance reasons (e.g. Luke's approach), you might however have some rare --cleanup-the-mess program option which forces it (and tries hard to keep valgrind happy) ...
But if you write things conceptually similar in high-view behavior to (Linux programs like:) cron, bash, guile, make, xslt, tidy, indent, convert, etc, so a shell program, or any kind of interactive interpreter which you would run (in most cases) for only a few minutes, you could reasonably decide to take Luke's approach. On the other hand, if you write a program which runs for a long time (some specialized server for example), you definitely want to avoid every memory leak (and you need to use valgrind).
Generally it is not required, and it's faster to let the OS take care of it. From a brief look at GNU coreutils source, many programs will simply call die() when encountering an error which will exit the process immediately.
In some systems there is a common c runtime, meaning that c programs share certain resources so a resource leak in one program can impact other applications. therefore it is essential that all applications release what is not in uses.
There is a good discussion on the CRT here What is the C runtime library?
I am doing a project to learn how a program is executed in Linux. Basically, I am trying to replicate the functionality of execve by running a series of system calls in a c program to take an executable binary, load it into memory, and successfully run it.
Are there any relatively easy-to-understand online resources (or tips) I can use to learn how to do this? I don't have much experience with this, and I'm trying to learn. It seems like a fairly complicated task, and I'm completely stuck at the moment.
Thank you.
Your main problem here is that part of the exec system call is overriding the process descriptor in the kernel. It's something you can't do in userspace.
Even if you close all file descriptors there are still plenty of other values you can't reach, nor can you free up dynamically loaded libraries and release you own program's code pages (since they would be write protected).
The basic approach to loading and running a code file would be to mmap it into the memory, then clear the stack, parse the ELF headers and jump to the program start function (assembly jmp instruction, mind you) But there's much more to an ELF file so it might not work without other initializations and dynamic linkage...
I have been asked to write a program in C which should debug another program in C language and then store the value of each variable of every line,loop or function in a log file.
I have been searching over the internet and I found articles on debugging using gdb.
Can I somehow use GDB in my program for this purpose and then store the values of each variable line by line.
I've got basic knowledge of C/C++ so please reply in simple terms.
Thanks
Debuggers depend on some special capability of the hardware, which must be exposed by the operating system (if any).
The basic idea is that the hardware is configured to transfer control to a debugger stub either after every instruction of the target program, or after certain types of instructions such system calls, or those meeting a hardware breakpoint condition. Typically this would look like an interrupt, supervisor exception, or the like - very platform-specific detail.
As mentioned in comments, on Linux, you use the ptrace functionality of the kernel to interact with the debugger support provided by the hardware and the kernel, abstracting away a lot of the hardware-unique detail and managing the permission issues. Typically you must either be the same user id as the process being debugged, or be the superuser (root). Linux's ptrace also gives you an indirect ability to do to things like access the memory (literally, address space) of the target application, something critical to debugger functionality which you cannot ordinarily do from another user-mode program on a multitasking operating system.
Other operating systems will have different methods. Some embedded targets use debug pods which connect your development machine to the embedded board by a few wires. In other cases, debug capability built into the hardware is managed by a small program running on the target processor, which then talks back over a serial or network port to the full debugger program residing on the development machine.
A program such as GDB can do more than just the basics of setting debug stop conditions, dumping registers, and dumping program instructions. Much of its code deals with annotating what it displays based on debug metadata optionally left behind by compilers, walking back through stack frames, and giving the user powerful tools to configure all of this - and of course it does most of this in a target-independent way, with the target-unique code mostly confined to a few interchangeable directories.
You can indeed "drive" GDB from another program - many, many GUI type debuggers do exactly that, existing as graphical front ends for GDB. However, if you were assigned to write a debugger, doing it that way may or may not by consistent with your assignment.
I googled about it and somewhere I read ....
Yes, you can. That is happening in the case of embedded systems
I think NO, it's not possible. Any platform must have an operating system. Or else, your program must itself be an OS.
Either soft or hard-wired. Without an operating system your component wouldn't work.
Am I right or can anybody explain me the answer? (I dont have any idea abt embedded systems...)
Of course you can. All a (typical) CPU needs is power and access to a memory, then it will execute its hard-coded boot sequence.
Typically this will involve reading some pre-defined address, interpreting the contents there as instructions, and starting to run them.
These instructions could of course come from a C program, although at this level it's more common to write the very early stages (called bootstrapping) in assembly.
This of course doesn't mean, if I were to read your question title literally, that any C program be run this way. If the program assumes there is an OS, but there isn't, it won't work. This should be pretty obvious.
You can run a program in a system without an Operating System ... and that program need not be an Operating System itself.
Think about all the computers (or processors if you prefer) inside a car: engine management, air conditioning, ABS, ..., ...
All of those system have a program (possibly written in C) running. None of the processors have an OS.
The Standard specifically differentiates between hosted implementations and freestanding implementations:
5.1.2.1 Freestanding environment
1 In a freestanding environment (in which C program execution may take place
without any benefit of an operating system), the name and type of the
function called at program startup are implementation-defined. Any library
facilities available to a freestanding program, other than the minimal set
required by clause 4, are implementation-defined.
2 The effect of program termination in a freestanding environment is
implementation-defined.
5.1.2.2 Hosted environment
1 A hosted environment need not be provided, but shall conform to the
following specifications if present.
...
I think you would have fun writing 'toy' kernels that are designed to run under simulators like QEMU (or virtualization platforms, Xen + MiniOS is one of my favorites). With not (much) difficulty, you could get a basic console up and running and start printing things to it. Its really fun, educational and satisfying all at once.
If you are working on x86 .. and get your spiffy kernel working under QEMU .. there's a very good chance that it will also work on real hardware. You might enjoy it.
Anyway, the answer to your question is most decidedly yes. Its especially easier if you happen to be using a boot loader .. for instance, google memtest86 and grab the code.
Usually, any C program will have a variety of system calls which depend on the operating system. For example, printf makes a system call to write to the screen buffer. Opening files and things like that are also system calls.
So basically, you can run the C code which just gets compiled and assembled in to machine code on a processor, but if the code makes any system calls, it would just freeze up the processor when it tries to jump to a memory location that it thinks is the operating system. This of course would depend on your being able to get the program running in the first place, which is not easy without the operating system as well.
Embedded systems are legitimate OS's in their own right, they're just not general purpose OS's. Any userland program (i.e. a program that is not itself an operating system) needs an operating system to run on top of.
As an example: Building Bare-Metal ARM Systems with GNU
Many embedded systems do not have enough resources for a full OS, some may use a scheduler kernel or RTOS, others are coded 'bare metal'. The main() C entry point is entered after reset. Only a small amount of assembler code is required to initialise a microprocessor, to execute C code. All C requires to run generally is a stack - usually simply a case of initialising the stack pointer to a specific address. Some processor specific initialisation of interrupt/exception vectors, system clocks, memory controllers etc. may be necessary also.
On a desktop PC, typically you have a BIOS that handles basic hardware initialisation such as SDRAM controller setup and timing, and then bootstrapping from a disk boot-sector, which then in turn bootstraps an OS. Any of that code could be written in C (and some of it probably is), and it could do something other than boot an OS - it could do anything - it is just code.
OSs are useful for non-dedicated computing devices where the end user many select one of many programs to execute and possibly several simultaneously. Most embedded systems do just one thing, the software is often loaded from ROM or executes directly from ROM, and is never changed and executes indefinitely (usually stopped only by power-down).
You still of course might implement device drivers and the like, but often these are an integral part of the application rather than a separate entity. Even when you do use an RTOS in an embedded system, it is still generally integral to your application rather than an OS in the sense you might understand. In these cases the RTOS is simply a library like any other, and is often initialised and started from main() rather then the other way around as you might expect.
every piece of hardware has to have a piece of software that operates it, be it embedded firmware (smaller and relatively fixed, like vxworks) or an operating system software that can run complex arbitrary code on top of it (like windows, linux, or mac).
think of it as a stack. at the bottom, you have the hardware. on top of that, a piece of software that can control that hardware. on top of that, you can have all sorts of stuff. in the case of a voip phone, you'll have vxworks controlling the hardware, and a layer on top of that that handles all the phone applications.
so going back to your question, yes, you CAN run any c program on anything, BUT it depends what kind of c program it is. if it's a low level c program that can talk to hardware, then you dont need anything other than your program and the hardware. if it's a higher level c program (like a chat program), then you need a whole bunch of stuff between your program and the hardware.
make sense?
Obviously, you cannot execute any arbitrary C program without some sort of OS or OS-equivalent. Similarly, I can write a C program under Linux that won't run under Microsoft Windows.
However, you can write C programs on almost anything. It's a popular language to write software for embedded systems in, and they very often don't have an OS.
Many embedded systems have just a CPU hooked up to a ROM, with pins coming out of the chip that are directly attached to inputs and outputs. There is no user I/O, no file system, no process scheduling, nothing you'd typically want an OS for. In those cases, a C programmer might write a program that is burned into a ROM, which will handle everything itself.
(Some embedded systems are more complicated, and can use an OS. Linux is frequently used, since it's free for the use, can be made very compact, and can be changed at any level. Not all do, though.)
You definitely don't need an OS to run your C code on any system. What you will need is two pieces of initialization code - one to initialize the hardware needed (processor, clock, memory) and another to set up your stack and C runtime (i.e. intialization of data and BSS sections). This, of course, means that you cannot take advantage of the multithreading, messaging and synchronization services that an OS would provide. I'll try and break it down into some steps to give you an idea:
Write a "reset_routine" that run when the board starts. This will initialize the clock and any external memory needed. (This routine will have to execute from a memory that is either internal or one that can be initialized and programmed externally).
The reset_routine, after the hardware initializations, transfers control to a "sw_runtime_init" routine that will set up the stack and the globals definied by you application. (Do a jump from reset_routine to sw_runtime_init instead of a call to avoid stack usage).
Compile and link this to you application, whilst ensuring that the "reset_routine" is linked to the location where the reset vector points to.
Load this onto your target and pray.