Is there any way to increase the stack size/recursion limit?

Is there any way to increase the stack size/recursion limit? - c

I'm writing a C program and am exceeding the recursion limit via a segmentation fault. Is there any way to increase the program's recursion limit (perhaps via increasing the stack size), either via an option to GCC or via a command-line option? The program is running on Ubunutu.

You can change the stack size with ulimit on Linux, for example:
ulimit -s unlimited
On Windows with Visual Studio, use /F option.

The stack size is a function of the operating system, though many earlier operating systems (MSDOS for example) didn't do program stack segment control: it was up to the program to reserve an adequately sized segment.
With virtual memory and 32-bit APIs, the stack size is usually provided by a resource management mechanism. For example, on Linux, the ulimit command provides one source of stack size control. Other levels of control are provided by mechanisms inside the kernel enforcing system policy, memory limitations, and other limits.

Related

how can I do reproducible benchmarks in a storage system?

If the memory of the machine is far larger than the cache configured for a storage system, the file system caches far more data than the cache configured for the storage system. so, how to do reproducible benchmarks with different machine memory and the same cache configured for the storage system?

Maybe try running a program that allocates and locks a bunch of memory (i.e. pin it so it can't be paged out), then sleeps. Kill it when you want to release the memory.
Specifically, I'm thinking of the mlock(2) POSIX system call, or the Linux-specific MAP_LOCKED flag for mmap(2). This requires root, since the default ulimit -l is only 64kiB for non-root users, at least on my Ubuntu desktop.
On an otherwise-idle system with nothing using much memory, it should be easy to detect the total present and lock all but 2GB of it, for example. It's probably less easy to choose a reasonable size to lock on systems with other processes running and using varying amounts of RAM.

Operating system kernel and processes in main memory

Continuing my endeavors in OS development research, I have constructed an almost complete picture in my head. One thing still eludes me.
Here is the basic boot process, from my understanding:
1) BIOS/Bootloader perform necessary checks, initialize everything.
2) The kernel is loaded into RAM.
3) Kernel performs its initializations and starts scheduling tasks.
4) When a task is loaded, it is given a virtual address space in which it resides. Including the .text, .data, .bss, the heap and stack. This task "maintains" its own stack pointer, pointing to its own "virtual" stack.
5) Context switches merely push the register file (all CPU registers), the stack pointer and program counter into some kernel data structure and load another set belonging to another process.
In this abstraction, the kernel is a "mother" process inside of which all other processes are hosted. I tried to convey my best understanding in the following diagram:
Question is, first is this simple model correct?
Second, how is the executable program made aware of its virtual stack? Is it the OS job to calculate the virtual stack pointer and place it in the relevant CPU register? Is the rest of the stack bookkeeping done by CPU pop and push commands?
Does the kernel itself have its own main stack and heap?
Thanks.

Question is, first is this simple model correct?
Your model is extremely simplified but essentially correct - note that the last two parts of your model aren't really considered to be part of the boot process, and the kernel isn't a process. It can be useful to visualize it as one, but it doesn't fit the definition of a process and it doesn't behave like one.
Second, how is the executable program made aware of its virtual stack?
Is it the OS job to calculate the virtual stack pointer and place it
in the relevant CPU register? Is the rest of the stack bookkeeping
done by CPU pop and push commands?
An executable C program doesn't have to be "aware of its virtual stack." When a C program is compiled into an executable, local variables are usually referenced in relative to the stack pointer - for example, [ebp - 4].
When Linux loads a new program for execution, it uses the start_thread macro (which is called from load_elf_binary) to initialize the CPU's registers. The macro contains the following line:
regs->esp = new_esp;
which will initialize the CPU's stack pointer register to the virtual address that the OS has assigned to the thread's stack.
As you said, once the stack pointer is loaded, assembly commands such as pop and push will change its value. The operating system is responsible for making sure that there are physical pages that correspond to the virtual stack addresses - in programs that use a lot of stack memory, the number of physical pages will grow as the program continues its execution. There is a limit for each process that you can find by using the ulimit -a command (on my machine the maximum stack size is 8MB, or 2KB pages).
Does the kernel itself have its own main stack and heap?
This is where visualizing the kernel as a process can become confusing. First of all, threads in Linux have a user stack and a kernel stack. They're essentially the same, differing only in protections and location (kernel stack is used when executing in Kernel Mode, and user stack when executing in User Mode).
The kernel itself does not have its own stack. Kernel code is always executed in the context of some thread, and each thread has its own fixed-size (usually 8KB) kernel stack. When a thread moves from User Mode to Kernel Mode, the CPU's stack pointer is updated accordingly. So when kernel code uses local variables, they are stored on the kernel stack of the thread in which they are executing.
During system startup, the start_kernel function initializes the kernel init thread, which will then create other kernel threads and begin initializing user programs. So after system startup the CPU's stack pointer will be initialized to point to init's kernel stack.
As far as the heap goes, you can dynamically allocate memory in the kernel using kmalloc, which will try to find a free page in memory - its internal implementation uses get_zeroed_page.

You forgot one important point: Virtual memory is enforced by hardware, typically known as the MMU (Memory Management Unit). It is the MMU that converts virtual addresses to physical addresses.
The kernel typically loads the address of the base of the page table for a specific process into a register in the MMU. This is what task-switches the virtual memory space from one process to another. On x86, this register is CR3.
Virtual memory protects processes' memory from each other. RAM for process A is simply not mapped into process B. (Except for e.g. shared libraries, where the same code memory is mapped into multiple processes, to save memory).
Virtual memory also protect kernel memory space from a user-mode process. Attributes on the pages covering kernel address space are set so that, when the processor is running in user-mode, it is not allowed to execute there.
Note that, while the kernel may have threads of its own, which run entirely in kernel space, the kernel shouldn't really be thought of a "a mother process" that runs independently of your user-mode programs. The kernel basically is "the other half" of your user-mode program! Whenever you issue a system call, the CPU automatically transitions into kernel mode, and starts executing at a pre-defined location, dictated by the kernel. The kernel system call handler then executes on your behalf, in the kernel-mode context of your process. Time spent in the kernel handling your request is accounted for, and "charged to" your process.

The helpful ways of thinking about kernel in context of relationships with processes and threads
Model provided by you is very simplified but correct in general.
In the same time the way of thinking about kernel as about "mother process" isn't best, but it still has some sense.
I would like to propose another two better models.
Try to think about kernel as about special kind of shared library.
Like a shared library kernel is shared between different processes.
System call is performed in a way which is conceptually similar to the routine call from shared library.
In both cases, after call, you execute of "foreign" code but in the context your native process.
And in both cases your code continues to perform computations based on stack.
Note also, that in both cases calls to "foreign" code lead to blocking of execution of your "native" code.
After return from the call, execution continues starting in the same point of code and with the same state of the stack from which call was performed.
But why we consider kernel as a "special" kind of shared library? Because:
a. Kernel is a "library" that is shared by every process in the system.
b. Kernel is a "library" that shares not only section of code, but also section of data.
c. Kernel is a specially protected "library". Your process can't access kernel code and data directly. Instead, it is forced to call kernel controlled manner via special "call gates".
d. In the case of system calls your application will execute on virtually continuous stack. But in reality this stack will be consist from two separated parts. One part is used in user mode and the second part will be logically attached to the top of your user mode stack during entering the kernel and deattached during exit.
Another useful way of thinking about organization of computations in your computer is consideration of it as a network of "virtual" computers which doesn't has support of virtual memory.
You can consider process as a virtual multiprocessor computer that executes only one program which has access to all memory.
In this model each "virtual" processor will be represented by thread of execution.
Like you can have a computer with multiple processors (or with multicore processor) you can have multiple oncurrently running threads in your process.
Like in your computer all processors have shared access to the pool of physical memory, all threads of your process share access to the same virtual address space.
And like separate computers are physically isolated from each other, your processes also isolated from each other but logically.
In this model kernel is represented by server having direct connections to each computer in the network with star topology.
Similarly to a networking servers, kernel has two main purposes:
a. Server assembles all computers in single network.
Similarly kernel provides a means of inter-process communication and synchronization. Kernel works as a man in the middle which mediates entire communication process (transfers data, routes messages and requests etc.).
b. Like server provides some set of services to each connected computer, kernel provides a set of services to the processes. For example, like a network file server allows computers read and write files located on shared storage, your kernel allows processes to do the same things but using local storage.
Note, that following the client-server communication paradigm, clients (processes) are the only active actors in the network. They issue request to the server and between each other. Server in its turn is a reactive part of the system and it never initiate communication. Instead it only replies to incoming requests.
This models reflect the resource sharing/isolation relationships between each part of the system and the client-server nature of communication between kernel and processes.
How stack management is performed, and what role plays kernel in that process
When the new process starts, kernel, using hints from executable image, decides where and how much of virtual address space will have reserved for the user mode stack of initial thread of the process.
Having this decision, kernel sets the initial values for the set of processor registers, which will be used by main thread of process just after start of the execution.
This setup includes setting of the initial value of stack pointer.
After actual start of process execution, process itself becomes responsible for stack pointer.
More interesting fact is that process is responsible for initialization of stack pointers of each new thread created by it.
But note that kernel kernel is responsible for allocation and management of kernel mode stack for each and every thread in the system.
Note also that kernel is resposible for physical memory allocation for the stack and usually perform this job lazily on demand using page faults as hints.
Stack pointer of running thread is managed by thread itself. In most cases stack pointer management is performed by compiler, when it builds executable image. Compiler usually tracks stack pointer value and maintain it's consistency by adding and tracking all instructions that relates to the stack.
Such instructions not limited only by "push" and "pop". There are many CPU instructions which affects the stack, for example "call" and "ret", "sub ESP" and "add ESP", etc.
So as you can see, actual policy of stack pointer management is mostly static and known before process execution.
Sometimes programs have a special part of the logic that performs special stack management.
For example implementations of coroutines or long jumps in C.
In fact, you are allowed to do whatever you want with the stack pointer in your program if you want.
Kernel stack architectures
I'm aware about three approaches to this issue:
Separate kernel stack per thread in the system. This is an approach adopted by most well-known OSes based on monolithic kernel including Windows, Linux, Unix, MacOS.
While this approach leads to the significant overhead in terms of memory and worsens cache utilization, but it improves preemption of the kernel, which is critical for the monolithic kernels with long-running system calls especially in the multi-processor environment.
Actually, long time ago Linux had only one shared kernel stack and entire kernel was covered by Big Kernel Lock that limits the number of threads, which can concurrently perform system call, by only one thread.
But linux kernel developers has quickly recognized that blocking execution of one process which wants to know for instance its PID, because another process already have started send of a big packet through very slow network is completely inefficient.
One shared kernel stack.
Tradeoff is very different for microkernels.
Small kernel with short system calls allows microkernel designers to stick to the design with single kernel stack.
In the presence of proof that all system calls are extremely short, they can benefit from improved cache utilization and smaller memory overhead, but still keep system responsiveness on the good level.
Kernel stack for each processor in the system.
One shared kernel stack even in microkernel OSes seriously affects scalability of the entire operating system in multiprocessor environment.
Due to this, designers frequently follow approach which is looks like compromise between two approaches described above, and keep one kernel stack per each processor (processor core) in the system.
In that case they benefit from good cache utilization and small memory overhead, which are much better than in the stack per thread approach and slightly worser than in single shared stack approach.
And in the same time they benefit from the good scalability and responsiveness of the system.
Thanks.

How does kernel restrict processes to their own memory pool?

This is purely academical question not related to any OS
We have x86 CPU and operating memory, this memory resembles some memory pool, that consist of addressable memory units that can be read or written to, using their address by MOV instruction of CPU (we can move memory from / to this memory pool).
Given that our program is the kernel, we have a full access to whole this memory pool. However if our program is not running directly on hardware, the kernel creates some "virtual" memory pool which lies somewhere inside the physical memory pool, our process consider it just as the physical memory pool and can write to it, read from it, or change its size usually by calling something like sbrk or brk (on Linux).
My question is, how is this virtual pool implemented? I know I can read whole linux source code and maybe one year I find it, but I can also ask here :)
I suppose that one of these 3 potential solutions is being used:
Interpret the instructions of program (very ineffective and unlikely): the kernel would just read the byte code of program and interpret each instruction individually, eg. if it saw a request to access memory the process isn't allowed to access it wouldn't let it.
Create some OS level API that would need to be used in order to read / write to memory and disallow access to raw memory, which is probably just as ineffective.
Hardware feature (probably best, but have no idea how that works): the kernel would say "dear CPU, now I will send you instructions from some unprivileged process, please restrict your instructions to memory area 0x00ABC023 - 0xDEADBEEF" the CPU wouldn't let the user process do anything wrong with the memory, except for that range approved by kernel.
The reason why am I asking, is to understand if there is any overhead in running program unprivileged behind the kernel (let's not consider overhead caused by multithreading implemented by kernel itself) or while running program natively on CPU (with no OS), as well as overhead in memory access caused by computer virtualization which probably uses similar technique.

You're on the right track when you mention a hardware feature. This is a feature known as protected mode and was introduced to x86 by Intel on the 80286 model. That evolved and changed over time, and currently x86 has 4 modes.
Processors start running in real mode and later a privileged software (ring0, your kernel for example) can switch between these modes.
The virtual addressing is implemented and enforced using the paging mechanism (How does x86 paging work?) supported by the processor.

On a normal system, memory protection is enforced at the MMU, or memory management unit, which is a hardware block that configurably maps virtual to physical addresses. Only the kernel is allowed to directly configure it, and operations which are illegal or go to unmapped pages raise exceptions to the kernel, which can then discipline the offending process or fetch the missing page from disk as appropriate.
A virtual machine typically uses CPU hardware features to trap and emulate privileged operations or those which would too literally interact with hardware state, while allowing ordinary operations to run directly and thus with moderate overall speed penalty. If those are unavailable, the whole thing must be emulated, which is indeed slow.

When running a program in Windows, what dictates the allowable memory for that program?

If I were to write a program in C and run it in Windows, is there something in the Win API that dictates whether or not a certain block of memory can be accessed by the program? If I want to be able to have the program access any block of memory that I want, is there something I have to disable? I realize that this is risky and can result in damaging the operating system.

In modern Windows (Windows with NT Kernel) the operating systems controls the way memory is accessed. So, the answer is: NO. There is nothing you can do about it. You won't be able to get your process to access ANY block of memory you want.
You could have done it in Win 3.0, Win 3.11, Win 95, Win 98, Win ME.

Yes, that's possible with VirtualAlloc(), the low level function that allocates virtual memory pages. The flProtect argument specifies how the memory can be accessed by the process, specifying PAGE_NOACCESS is possible, albeit that it is not exactly used very often.
If you are actually talking about RAM then no, a user mode program never has direct access to physical memory on a protected mode operating system like Windows. It can only ever address virtual memory, the mapping to RAM is performed by the OS kernel. Only code that runs in ring 0 has the capability. Denying access to certain physical addresses only makes sense for a memory-mapped I/O device. Which would already have a driver that reserves the address space.

You cannot/will not/must not access kernel memory. Modern operating systems except in kernel mode don't allow to allocate from those memory regions.

How can you limit RAM consumption in a process?

How can you limit the physical memory consumption of a C program from within the source code on a linux 2.6.32 machine?
I need to determine the type of page replacement algorithm the system is using.
The problem is that without limiting the number of pages a process can have in memory, it becomes difficult to analyze the pattern of page faults to determine the page replacement algorithm.
Also, I don't have root access on the machine.