Multithreading implementation itself in C - c

I am a beginner C/C++ programmer first of all, but I am curious about it.
My question is more theoretical.
I heard that C does not have explicit multithreading (MT) support, however there are libraries which implement this. I found "process.h" header which has to be included for building MT programs, but the thing I don't understand is how the MT itself works.
I know there are threads in CPU (assume it's single core for simplicity) running and there is only one thread per moment. The CPU is switching between threads really fast so that user sees it as a simultaneous work (correct me if not).
But - what really happens when I write the following
beginthread( Thread, 0, NULL ) //or whatever function/class method we use
keeping in mind that C does not have MT support. I mean, how does code tell the PC to run two functions multithreaded while it is not possible by the language explicit methods? I guess there is some "cheat" inside library related to "process.h", but what is that cheat, I can't just find on the web.
To be more specific - I am not asking about how to use MT, but how is it build?
Sorry if was answered earlier, or question is too complicated :)
UPD:
Imagine we have C language. It has functions, variables, pointers etc. I dont know any "special" function type that can run concurrently with other. Unless there are calls to some other functions from it. But then the caller function stops and waits?
Is it so that when I run MT applications, there is a special "global" function that calls my f1() and f2() repeatedly which looks like they were simultaneously working?

First of all, C11 does actually add multithreading support to the standard, so the premise that C does not support multithreading is no longer entirely correct.
However, I'm assuming your question is more to do with how can multithreading be implemented by a C library when standard C does(/did) not provide the necessary tools. The answer lies in the word “standard” – compilers and platforms can provide additional functionality beyond that required by the standard. Using such extra features makes the program/library less portable (i.e., more is required than is specified in the C standard), but the language and function call semantics can still be C.
Perhaps it is helpful to consider a standard library function such as fopen – somewhere inside that function code must eventually be called which could not be written in standard C, i.e., the implementation of the standard library itself must also rely on platform-specific code to access operating system functionality such as the file system. Every implementation of the standard library must thus implement the non-portable parts in a way specific to that platform (this is kind of the point of having a standard library instead of all code being platform-specific). But likewise a multithreading library can be implemented with non-standard features provided by that platform, but using such a library makes the code portable only to the platforms for which the same (or compatible) multithreading library is available.
As for how multithreading itself works, it is certainly outside the scope of what can be answered here, but as a simplified conceptual model on a single processor core, you can imagine the operating system managing “concurrent” processes by running one process for a short time, interrupting it, saving its state (current instruction, registers, etc), loading the saved state of another process, and repeating this. This gives the illusion of concurrent execution though in actual fact it is switching rapidly between different processes. On multi-core systems the execution on different cores can actually be concurrent, but there are typically more processes than there are cores, so this kind of switching will still happen on individual cores. Things are further complicated by processes waiting for something (I/O, another process, a timer, etc). Perhaps it suffice to say that the scheduler is a piece of software managing all of this inside the operating system and the multithreading library communicates with it.
(Note that there are many different ways to implement multithreading and multitasking, and statements in the above paragraph do not apply to all of them.)

It's platform specific. On Windows it eventually goes down to NtCreateThread which uses the assembly instruction syscall to call the operating system. So you can qualify it as a cheat.
On Linux it's the same, just the function with the syscall is called clone instead.

Related

Does NtDll really export C runtime functions, and can I use these in my application?

I was looking at the NtDll export table on my Windows 10 computer, and I found that it exports standard C runtime functions, like memcpy, sprintf, strlen, etc.
Does that mean that I can call them dynamically at runtime through LoadLibrary and GetProcAddress? Is this guaranteed to be the case for every Windows version?
If so, it is possible to drop the C runtime library altogether (by just using the CRT functions from NtDll), therefore making my program smaller?
There is absolutely no reason to call these undocumented functions exported by NtDll. Windows exports all of the essential C runtime functions as documented wrappers from the standard system libraries, namely Kernel32. If you absolutely cannot link to the C Runtime Library*, then you should be calling these functions. For memory, you have the basic HeapAlloc and HeapFree (or perhaps VirtualAlloc and VirtualFree), ZeroMemory, FillMemory, MoveMemory, CopyMemory, etc. For string manipulation, the important CRT functions are all there, prefixed with an l: lstrlen, lstrcat, lstrcpy, lstrcmp, etc. The odd man out is wsprintf (and its brother wvsprintf), which not only has a different prefix but also doesn't support floating-point values (Windows itself had no floating-point code in the early days when these functions were first exported and documented.) There are a variety of other helper functions, too, that replicate functionality in the CRT, like IsCharLower, CharLower, CharLowerBuff, etc.
Here is an old knowledge base article that documents some of the Win32 Equivalents for C Run-Time Functions. There are likely other relevant Win32 functions that you would probably need if you were re-implementing the functionality of the CRT, but these are the direct, drop-in replacements.
Some of these are absolutely required by the infrastructure of the operating system, and would be called internally by any CRT implementation. This category includes things like HeapAlloc and HeapFree, which are the responsibility of the operating system. A runtime library only wraps those, providing a nice standard-C interface and some other niceties on top of the nitty-gritty OS-level details. Others, like the string manipulation functions, are just exported wrappers around an internal Windows version of the CRT (except that it's a really old version of the CRT, fixed back at some time in history, save for possibly major security holes that have gotten patched over the years). Still others are almost completely superfluous, or seem so, like ZeroMemory and MoveMemory, but are actually exported so that they can be used from environments where there is no C Runtime Library, like classic Visual Basic (VB 6).
It is also interesting to point out that many of the "simple" C Runtime Library functions are implemented by Microsoft's (and other vendors') compiler as intrinsic functions, with special handling. This means that they can be highly optimized. Basically, the relevant object code is emitted inline, directly in your application's binary, avoiding the need for a potentially expensive function call. Allowing the compiler to generate inlined code for something like strlen, that gets called all the time, will almost undoubtedly lead to better performance than having to pay the cost of a function call to one of the exported Windows APIs. There is no way for the compiler to "inline" lstrlen; it gets called just like any other function. This gets you back to the classic tradeoff between speed and size. Sometimes a smaller binary is faster, but sometimes it's not. Not having to link the CRT will produce a smaller binary, since it uses function calls rather than inline implementations, but probably won't produce faster code in the general case.
* However, you really should be linking to the C Runtime Library bundled with your compiler, for a variety of reasons, not the least of which is security updates that can be distributed to all versions of the operating system via updated versions of the runtime libraries. You have to have a really good reason not to use the CRT, such as if you are trying to build the world's smallest executable. And not having these functions available will only be the first of your hurdles. The CRT handles a lot of stuff for you that you don't normally even have to think about, like getting the process up and running, setting up a standard C or C++ environment, parsing the command line arguments, running static initializers, implementing constructors and destructors (if you're writing C++), supporting structured exception handling (SEH, which is used for C++ exceptions, too) and so on. I have gotten a simple C app to compile without a dependency on the CRT, but it took quite a bit of fiddling, and I certainly wouldn't recommend it for anything remotely serious. Matthew Wilson wrote an article a long time ago about Avoiding the Visual C++ Runtime Library. It is largely out of date, because it focuses on the Visual C++ 6 development environment, but a lot of the big picture stuff is still relevant. Matt Pietrek wrote an article about this in the Microsoft Journal a long while ago, too. The title was "Under the Hood: Reduce EXE and DLL Size with LIBCTINY.LIB". A copy can still be found on MSDN and, in case that becomes inaccessible during one of Microsoft's reorganizations, on the Wayback Machine. (Hat tip to IInspectable and Gertjan Brouwer for digging up the links!)
If your concern is just the need to distribute the C Runtime Library DLL(s) alongside your application, you can consider statically linking to the CRT. This embeds the code into your executable, and eliminates the requirement for the separate DLLs. Again, this bloats your executable, but does make it simpler to deploy without the need for an installer or even a ZIP file. The big caveat of this, naturally, is that you cannot benefit to incremental security updates to the CRT DLLs; you have to recompile and redistribute the application to get those fixes. For toy apps with no other dependencies, I often choose to statically link; otherwise, dynamically linking is still the recommended scenario.
There are some C runtime functions in NtDll. According to Windows Internals these are limited to string manipulation functions. There are other equivalents such as using HeapAlloc instead of malloc, so you may get away with it depending on your requirements.
Although these functions are acknowledged by Microsoft publications and have been used for many years by the kernel programmers, they are not part of the official Windows API and you should not use of them for anything other than toy or demo programs as their presence and function may change.
You may want to read a discussion of the option for doing this for the Rust language here.
Does that mean that I can call them dynamically at runtime through
LoadLibrary and GetProcAddress?
yes. even more - why not use ntdll.lib (or ntdllp.lib) for static binding to ntdll ? and after this you can direct call this functions without any GetProcAddress
Is this guaranteed to be the case for every Windows version?
from nt4 to win10 exist many C runtime functions in ntdll, but it set is different. usual it grow from version to version. but some of then less functional compare msvcrt.dll . say for example printf from ntdll not support floating point format, but in general functional is same
it is possible to drop the C runtime library altogether (by just using
the CRT functions from NtDll), therefore making my program smaller?
yes, this is 100% possible.

How can I replay a multithreaded application?

I want to record synchronization operations, such as locks, sempahores, barriers of a multithreaded application, so that I can replay the recorded application later on, for the purpose of debugging.
On way is to supply your own lock, sempaphore, condition variables etc.. functions that also do logging, but I think that is an overkill, because down under they must be using some common synchronization operations.
So my question is which synchronization operations I should log so that I require minimum modifications to my program. In other words, which are the functions or macros in glibc and system calls over which all these synchronization operations are built? So that I only modify those for logging and replaying.
The best I can think of is debugging with gdb in 'record' mode:
Gdb process record/replay execution log
According to this page: GDB Process Record threading support is underway, but it might not be complete yet.
Less strictly answering your question, may I suggest
Helgrind
DRD
On other platforms, several other threading checkers exist, but I haven't got much experience with them.
In your case, an effective method of "logging" systems calls on Linux may be to use the LD_PRELOAD trick, and over-ride the actual system calls with your own versions of the calls that will log the use of the call and then forward to the actual system call.
A more extensive example is given here in Linux Journal.
As you can see at these links, the basic gist of the "trick" is that you can make the system load your own dynamic library before any other system libraries, such as pthreads, etc., and then mask the calls to those library functions by placing your own versions of those functions as the precendent. You can then, inside your over-riding function, log the use of the original function, as well as pass on the arguments to the actual call you're attempting to log.
The nice thing about this method is it will catch pretty much any call you can make, both a function that remains entirely in user-land, as well as a function that will make a kernel call.
So GDB record mode doesn't support multithreading, but the RR record/replay system absolutely does: https://rr-project.org/ .
For a commercial solution with fewer technical restrictions, there's also UDB: https://undo.io/solutions/ .
I've worked on debuggers for some years now and from what I've seen, the GDB record+replay stuff is really not ready for primetime, for this and other reasons (eg, slowdown & huge memory requirements).
If you can get it to work in your dev environment, record+replay/reversible debugging can be pretty gamechanging for your workflow; I hope you find a way to leverage it.

Can I execute any c made prog without any os platform?

I googled about it and somewhere I read ....
Yes, you can. That is happening in the case of embedded systems
I think NO, it's not possible. Any platform must have an operating system. Or else, your program must itself be an OS.
Either soft or hard-wired. Without an operating system your component wouldn't work.
Am I right or can anybody explain me the answer? (I dont have any idea abt embedded systems...)
Of course you can. All a (typical) CPU needs is power and access to a memory, then it will execute its hard-coded boot sequence.
Typically this will involve reading some pre-defined address, interpreting the contents there as instructions, and starting to run them.
These instructions could of course come from a C program, although at this level it's more common to write the very early stages (called bootstrapping) in assembly.
This of course doesn't mean, if I were to read your question title literally, that any C program be run this way. If the program assumes there is an OS, but there isn't, it won't work. This should be pretty obvious.
You can run a program in a system without an Operating System ... and that program need not be an Operating System itself.
Think about all the computers (or processors if you prefer) inside a car: engine management, air conditioning, ABS, ..., ...
All of those system have a program (possibly written in C) running. None of the processors have an OS.
The Standard specifically differentiates between hosted implementations and freestanding implementations:
5.1.2.1 Freestanding environment
1 In a freestanding environment (in which C program execution may take place
without any benefit of an operating system), the name and type of the
function called at program startup are implementation-defined. Any library
facilities available to a freestanding program, other than the minimal set
required by clause 4, are implementation-defined.
2 The effect of program termination in a freestanding environment is
implementation-defined.
5.1.2.2 Hosted environment
1 A hosted environment need not be provided, but shall conform to the
following specifications if present.
...
I think you would have fun writing 'toy' kernels that are designed to run under simulators like QEMU (or virtualization platforms, Xen + MiniOS is one of my favorites). With not (much) difficulty, you could get a basic console up and running and start printing things to it. Its really fun, educational and satisfying all at once.
If you are working on x86 .. and get your spiffy kernel working under QEMU .. there's a very good chance that it will also work on real hardware. You might enjoy it.
Anyway, the answer to your question is most decidedly yes. Its especially easier if you happen to be using a boot loader .. for instance, google memtest86 and grab the code.
Usually, any C program will have a variety of system calls which depend on the operating system. For example, printf makes a system call to write to the screen buffer. Opening files and things like that are also system calls.
So basically, you can run the C code which just gets compiled and assembled in to machine code on a processor, but if the code makes any system calls, it would just freeze up the processor when it tries to jump to a memory location that it thinks is the operating system. This of course would depend on your being able to get the program running in the first place, which is not easy without the operating system as well.
Embedded systems are legitimate OS's in their own right, they're just not general purpose OS's. Any userland program (i.e. a program that is not itself an operating system) needs an operating system to run on top of.
As an example: Building Bare-Metal ARM Systems with GNU
Many embedded systems do not have enough resources for a full OS, some may use a scheduler kernel or RTOS, others are coded 'bare metal'. The main() C entry point is entered after reset. Only a small amount of assembler code is required to initialise a microprocessor, to execute C code. All C requires to run generally is a stack - usually simply a case of initialising the stack pointer to a specific address. Some processor specific initialisation of interrupt/exception vectors, system clocks, memory controllers etc. may be necessary also.
On a desktop PC, typically you have a BIOS that handles basic hardware initialisation such as SDRAM controller setup and timing, and then bootstrapping from a disk boot-sector, which then in turn bootstraps an OS. Any of that code could be written in C (and some of it probably is), and it could do something other than boot an OS - it could do anything - it is just code.
OSs are useful for non-dedicated computing devices where the end user many select one of many programs to execute and possibly several simultaneously. Most embedded systems do just one thing, the software is often loaded from ROM or executes directly from ROM, and is never changed and executes indefinitely (usually stopped only by power-down).
You still of course might implement device drivers and the like, but often these are an integral part of the application rather than a separate entity. Even when you do use an RTOS in an embedded system, it is still generally integral to your application rather than an OS in the sense you might understand. In these cases the RTOS is simply a library like any other, and is often initialised and started from main() rather then the other way around as you might expect.
every piece of hardware has to have a piece of software that operates it, be it embedded firmware (smaller and relatively fixed, like vxworks) or an operating system software that can run complex arbitrary code on top of it (like windows, linux, or mac).
think of it as a stack. at the bottom, you have the hardware. on top of that, a piece of software that can control that hardware. on top of that, you can have all sorts of stuff. in the case of a voip phone, you'll have vxworks controlling the hardware, and a layer on top of that that handles all the phone applications.
so going back to your question, yes, you CAN run any c program on anything, BUT it depends what kind of c program it is. if it's a low level c program that can talk to hardware, then you dont need anything other than your program and the hardware. if it's a higher level c program (like a chat program), then you need a whole bunch of stuff between your program and the hardware.
make sense?
Obviously, you cannot execute any arbitrary C program without some sort of OS or OS-equivalent. Similarly, I can write a C program under Linux that won't run under Microsoft Windows.
However, you can write C programs on almost anything. It's a popular language to write software for embedded systems in, and they very often don't have an OS.
Many embedded systems have just a CPU hooked up to a ROM, with pins coming out of the chip that are directly attached to inputs and outputs. There is no user I/O, no file system, no process scheduling, nothing you'd typically want an OS for. In those cases, a C programmer might write a program that is burned into a ROM, which will handle everything itself.
(Some embedded systems are more complicated, and can use an OS. Linux is frequently used, since it's free for the use, can be made very compact, and can be changed at any level. Not all do, though.)
You definitely don't need an OS to run your C code on any system. What you will need is two pieces of initialization code - one to initialize the hardware needed (processor, clock, memory) and another to set up your stack and C runtime (i.e. intialization of data and BSS sections). This, of course, means that you cannot take advantage of the multithreading, messaging and synchronization services that an OS would provide. I'll try and break it down into some steps to give you an idea:
Write a "reset_routine" that run when the board starts. This will initialize the clock and any external memory needed. (This routine will have to execute from a memory that is either internal or one that can be initialized and programmed externally).
The reset_routine, after the hardware initializations, transfers control to a "sw_runtime_init" routine that will set up the stack and the globals definied by you application. (Do a jump from reset_routine to sw_runtime_init instead of a call to avoid stack usage).
Compile and link this to you application, whilst ensuring that the "reset_routine" is linked to the location where the reset vector points to.
Load this onto your target and pray.

Portable thread-safety in C?

Purpose
I'm writing a small library for which portability is the biggest concern. It has been designed to assume only a mostly-compliant C90 (ISO/IEC 9899:1990) environment... nothing more. The set of functions provided by the library all operate (read/write) on an internal data structure. I've considered some other design alternatives, but nothing else seems feasible for what the library is trying to achieve.
Question
Are there any portable algorithms, techniques, or incantations which can be used to ensure thread-safety? I am not concerned with making the functions re-entrant. Moreover, I am not concerned with speed or (possibly) wasting resources if the algorithm/technique/incantation is portable. Ideally, I don't want to depend on any libraries (such as GNU Pth) or system-specific operations (like atomic test-and-set).
I have considered modifying Lamport's bakery algorithm, but I do not know how to alter it to work inside of the functions called by the threads instead of working in the threads themselves.
Any help is greatly appreciated.
Without OS/hardware support, at least an atomic CAS, there's nothing you can do that's practical. There are portable libraries that abstract various platforms into a common interface, though.
http://www.gnu.org/software/pth/related.html
Almost all systems (even Windows) can run libpthread these days.
Lamport's bakery algorithm would probably work; unfortunately, there are still practical problems with it. Specifically, many CPUs implement out-of-order memory operations: even if you've compiled your code into a perfectly correct instruction sequence, the CPU, when executing your code, may decide to reorder the instructions on the fly to achieve better performance. The only way to get around this is to use memory barriers, which are highly system- and CPU-specific.
You really only have two choices here: either (1) keep your library thread-unsafe and make your users aware of that in the documentation, or (2) use a platform-specific mutex. Option 2 can be made easier by using another library that implements mutexes for a large variety of platforms and provides you with a unified, abstract interface.
Functions either cannot be thread safe or are innately thread safe, depending on how you want to look at it. And threading/locking is innately platform specific. Really, it is up to users of your library to handle the the threading issues.

What's the difference between "C system calls" and "C library routines"?

There are multiple sections in the manpages. Two of them are:
2 Unix and C system calls
3 C Library routines for C programs
For example there is getmntinfo(3) and getfsstat(2), both look like they do the same thing. When should one use which and what is the difference?
System calls are operating system functions, like on UNIX, the malloc() function is built on top of the sbrk() system call (for resizing process memory space).
Libraries are just application code that's not part of the operating system and will often be available on more than one OS. They're basically the same as function calls within your own program.
The line can be a little blurry but just view system calls as kernel-level functionality.
Libraries of common functions are built on top of the system call interface, but applications are free to use both.
System calls are like authentication keys which have the access to use kernel resources.
Above image is from Advanced Linux programming and helps to understand how the user apps interact with kernel.
System calls are the interface between user-level code and the kernel. C Library routines are library calls like any other, they just happen to be really commonly provided (pretty much universally). A lot of standard library routines are wrappers (thin or otherwise) around system calls, which does tend to blur the line a bit.
As to which one to use, as a general rule, use the one that best suits your needs.
The calls described in section 2 of the manual are all relatively thin wrappers around actual calls to system services that trap to the kernel. The C standard library routines described in section 3 of the manual are client-side library functions that may or may not actually use system calls.
This posting has a description of system calls and trapping to the kernel (in a slightly different context) and explains the underlying mechanism behind system calls with some references.
As a general rule, you should always use the C library version. They often have wrappers that handle esoteric things like restarts on a signal (if you have requested that). This is especially true if you have already linked with the library. All rules have reasons to be broken. Reasons to use the direct calls,
You want to be libc agnostic; Maybe with an installer. Such code could run on Android (bionic), uClibc, and more traditional glibc/eglibc systems, regardless of the library used. Also, dynamic loading with wrappers to make a run-time glibc/bionic layer allowing a dual Android/Linux binary.
You need extreme performance. Although this is probably rare and most likely misguided. Probably rethinking the problem will give better performance benefits and not calling the system is often a performance win, which the libc can occasionally do.
You are writing some initramfs or init code without a library; to create a smaller image or boot faster.
You are testing a new kernel/platform and don't want to complicate life with a full blown file system; very similar to the initramfs.
You wish to do something very quickly on program startup, but eventually want to use the libc routines.
To avoid a known bug in the libc.
The functionality is not available through libc.
Sorry, most of the examples are Linux specific, but the rationals should apply to other Unix variants. The last item is quite common when new features are introduced into a kernel. For example when kqueue or epoll where first introduced, there was no libc to support them. This may also happen if the system has an older library, but a newer kernel and you wish to use this functionality.
If your process hasn't used the libc, then most likely something in the system will have. By coding your own variants, you can negate the cache by providing two paths to the same end goal. Also, Unix
will share the code pages between processes. Generally there is no reason not to use the libc version.
Other answers have already done a stellar job on the difference between libc and system calls.

Resources