Is fwrite atomic? - c

A simple question:
I need to add some logging to my program.
If two processes use "fwrite" on the same file but not the same file descriptor will the written log messages be atomic or mixed. Is there a length limit?
Is it defined ANSI-C behaviour or implementation defined?
If the later what is on MacOSX, Linux and Windows MSVC?

After doing some research and I've found the following in this link:
POSIX standard requires that C stdio
FILE* operations are atomic.
POSIX-conforming C libraries (e.g, on
Solaris and GNU/Linux) have an
internal mutex to serialize operations
on FILE*s.
It looks like that calls should be atomic, but it depends on your platform. In same link, there is also another paragraph that lets you think that the programmer should take care:
So, for 3.0, the question of "is
multithreading safe for I/O" must be
answered with, "is your platform's C
library threadsafe for I/O?" Some are
by default, some are not; many offer
multiple implementations of the C
library with varying tradeoffs of
threadsafety and efficiency. You, the
programmer, are always required to
take care with multiple threads.
Also, as you have two different FILE* in two different processes, I think you have no choice.

It can be mixed.
If you have more than one thread/process writing to the same file, you need to use locking.
An alternative is to send log messages to a dedicated service/thread. An excellent tool to adopt is syslog, which is surely installed on all unixes and can be run on Windows.

From "man flockfile" on Debian lenny, the stdio functions are thread-safe.
There're thread-unsafe stdio functions, "man unlocked_stdio" for more details.
You can get more information from the man page.

fwrite for visual studio locks the calling thread and is therefore thread-safe

Related

Multithreading implementation itself in C

I am a beginner C/C++ programmer first of all, but I am curious about it.
My question is more theoretical.
I heard that C does not have explicit multithreading (MT) support, however there are libraries which implement this. I found "process.h" header which has to be included for building MT programs, but the thing I don't understand is how the MT itself works.
I know there are threads in CPU (assume it's single core for simplicity) running and there is only one thread per moment. The CPU is switching between threads really fast so that user sees it as a simultaneous work (correct me if not).
But - what really happens when I write the following
beginthread( Thread, 0, NULL ) //or whatever function/class method we use
keeping in mind that C does not have MT support. I mean, how does code tell the PC to run two functions multithreaded while it is not possible by the language explicit methods? I guess there is some "cheat" inside library related to "process.h", but what is that cheat, I can't just find on the web.
To be more specific - I am not asking about how to use MT, but how is it build?
Sorry if was answered earlier, or question is too complicated :)
UPD:
Imagine we have C language. It has functions, variables, pointers etc. I dont know any "special" function type that can run concurrently with other. Unless there are calls to some other functions from it. But then the caller function stops and waits?
Is it so that when I run MT applications, there is a special "global" function that calls my f1() and f2() repeatedly which looks like they were simultaneously working?
First of all, C11 does actually add multithreading support to the standard, so the premise that C does not support multithreading is no longer entirely correct.
However, I'm assuming your question is more to do with how can multithreading be implemented by a C library when standard C does(/did) not provide the necessary tools. The answer lies in the word “standard” – compilers and platforms can provide additional functionality beyond that required by the standard. Using such extra features makes the program/library less portable (i.e., more is required than is specified in the C standard), but the language and function call semantics can still be C.
Perhaps it is helpful to consider a standard library function such as fopen – somewhere inside that function code must eventually be called which could not be written in standard C, i.e., the implementation of the standard library itself must also rely on platform-specific code to access operating system functionality such as the file system. Every implementation of the standard library must thus implement the non-portable parts in a way specific to that platform (this is kind of the point of having a standard library instead of all code being platform-specific). But likewise a multithreading library can be implemented with non-standard features provided by that platform, but using such a library makes the code portable only to the platforms for which the same (or compatible) multithreading library is available.
As for how multithreading itself works, it is certainly outside the scope of what can be answered here, but as a simplified conceptual model on a single processor core, you can imagine the operating system managing “concurrent” processes by running one process for a short time, interrupting it, saving its state (current instruction, registers, etc), loading the saved state of another process, and repeating this. This gives the illusion of concurrent execution though in actual fact it is switching rapidly between different processes. On multi-core systems the execution on different cores can actually be concurrent, but there are typically more processes than there are cores, so this kind of switching will still happen on individual cores. Things are further complicated by processes waiting for something (I/O, another process, a timer, etc). Perhaps it suffice to say that the scheduler is a piece of software managing all of this inside the operating system and the multithreading library communicates with it.
(Note that there are many different ways to implement multithreading and multitasking, and statements in the above paragraph do not apply to all of them.)
It's platform specific. On Windows it eventually goes down to NtCreateThread which uses the assembly instruction syscall to call the operating system. So you can qualify it as a cheat.
On Linux it's the same, just the function with the syscall is called clone instead.

There are how many ways to lock in c

I am quite a newbie to c programming. Until now i only found pthread_mutex_lock can make the code region run only by one thread. Does there are any other ways to implement a lock? Or every other way to do a lock is still use pthread_mutex_lock function?
Threads were only introduced into the ISO C standard with C11, a rather recent edition to the standard so not necessarily widely supported yet.
You need to look into threads.h and the mtx_* functions for an understanding of that.
Before then, pthreads was probably your best bet with its wide implementation although, not being standard C (a), its support wasn't mandated.
For example, Windows has its own way of doing threading, using functions like CreateThread.
However, there are various third-party products such as pthreads-win32 that aim to give pthreads support to Windows, to assist in porting of applications from POSIX-compliant operating systems.
(a) It is a POSIX standard (part of IEEE 1003.1) so that may be good enough for some people.
There is no way to lock in the c language. Operating systems might provide support for locking (without regard for the language), and libraries such as pthreads can take advantage of operating system services, however this is beside the language. (By contast, other languages have native locking built into them, such as through Java's synchronized keyword.)

Programming MacOS-X and the Linux API - POSIX compatible?

I am learning(just finished) C+Algorithms and am a newbie. I wanted to know if the POSIX Linux API is used on a Mac. Linux has functions like pread pwrite readv writev nftw symlink pipe popen posix_self sigprocmask sigaction (system calls). Does the Mac have the same API?? I heard that OS-X is based on a BSD kernel so i was wondering if i could use code written on Linux on OS-X if i stuck to using only POSIX functions. How similar is the OS-X API to the Linux POSIX/SUSv3 API??
The Wikipedia article on POSIX has a section dedicated to compliance. Short answer: yeah, it's going to have all the POSIX functionality you're likely to come up against. And it will probably have more (e.g. a lot of BSD apis that might not actually be POSIX)
If you're asking if you can write code that works on both platforms, the answer is yes. If you're asking if that's easy, the answer is probably no.
POSIX is not a Linux standard, but Linux follows it, as does OSX, BSD, HPUX, Solaris, and even some real time operating systems like QNX (just to name a few).
If you program to a POSIX API as much as possible, porting your code will be much easier.
They're close. There are some BSD things you won't find in Linux, and some Linux things you won't find on BSD. You can nearly always restrict yourself to a subset which is compatible with both platforms without much trouble. POSIX APIs which are not portable are often given the _np designator. You'll also find some things are not (yet) supported on one platform. There is a safe middle ground in the more established APIs. And this can of course vary by which OS versions you are targeting. Implementations may vary slightly if you wander into less common APIs. My first stop is to check the man pages.

Writing a POSIX-compliant kernel

I've wanted to write a kernel for some time now. I already have a sufficient knowledge of C and I've dabbled in x86 Assembler. You see, I've wanted to write a kernel that is POSIX-compliant in C so that *NIX applications can be potentially ported to my OS, but I haven't found many resources on standard POSIX kernel functions. I have found resources on the filesystem structure, environment variables, and more on the Open Group's POSIX page.
Unfortunately, I haven't found anything explaining what calls and kernel functions a POSIX-compliant kernel must have (in other words, what kind of internal structure must a kernel have to comply with POSIX). If anyone could find that information, please tell me.
POSIX doesn't define the internal structure of the kernel, the kernel-to-userspace interface, or even libc, at all. Indeed, even Windows has a POSIX-compliant subsystem. Just make sure the POSIX interfaces defined at your link there work somehow. Note, however, that POSIX does not require anything to be implemented specifically in the kernel - you can implement things in the C library using simpler kernel interfaces of your own design where possible, if you prefer.
It just so happens that a lot of the POSIX compliant OSes (BSD, Linux, etc) have a fairly close relationship between many of those calls and the kernel layer, but there are exceptions. For example, on Linux, a write() call is a direct syscall, invoking a sys_write() function in the kernel. However on Windows, write() is implemented in a POSIX support DLL, which translates the file descriptor to a NT handle and calls NtWriteFile() to service it, which in turn invokes a corresponding system call in ntoskrnl.exe. So you have a lot of freedom in how to do things - which makes things harder, if anything :)
The opengroup.org leaves the decisions about kernel syscalls to each implmentation.
write(), for example has to look and behave as stated, but what it calls underneath is not defined. A lot of calls like write, read, lseek are free to call whatever entrypoint they want inside the kernel.
So, no, there really is nothing that says you have to have a certain function name with a defined set of semantics available in the kernel. It just has to available in the C runtime library.

What's the difference between "C system calls" and "C library routines"?

There are multiple sections in the manpages. Two of them are:
2 Unix and C system calls
3 C Library routines for C programs
For example there is getmntinfo(3) and getfsstat(2), both look like they do the same thing. When should one use which and what is the difference?
System calls are operating system functions, like on UNIX, the malloc() function is built on top of the sbrk() system call (for resizing process memory space).
Libraries are just application code that's not part of the operating system and will often be available on more than one OS. They're basically the same as function calls within your own program.
The line can be a little blurry but just view system calls as kernel-level functionality.
Libraries of common functions are built on top of the system call interface, but applications are free to use both.
System calls are like authentication keys which have the access to use kernel resources.
Above image is from Advanced Linux programming and helps to understand how the user apps interact with kernel.
System calls are the interface between user-level code and the kernel. C Library routines are library calls like any other, they just happen to be really commonly provided (pretty much universally). A lot of standard library routines are wrappers (thin or otherwise) around system calls, which does tend to blur the line a bit.
As to which one to use, as a general rule, use the one that best suits your needs.
The calls described in section 2 of the manual are all relatively thin wrappers around actual calls to system services that trap to the kernel. The C standard library routines described in section 3 of the manual are client-side library functions that may or may not actually use system calls.
This posting has a description of system calls and trapping to the kernel (in a slightly different context) and explains the underlying mechanism behind system calls with some references.
As a general rule, you should always use the C library version. They often have wrappers that handle esoteric things like restarts on a signal (if you have requested that). This is especially true if you have already linked with the library. All rules have reasons to be broken. Reasons to use the direct calls,
You want to be libc agnostic; Maybe with an installer. Such code could run on Android (bionic), uClibc, and more traditional glibc/eglibc systems, regardless of the library used. Also, dynamic loading with wrappers to make a run-time glibc/bionic layer allowing a dual Android/Linux binary.
You need extreme performance. Although this is probably rare and most likely misguided. Probably rethinking the problem will give better performance benefits and not calling the system is often a performance win, which the libc can occasionally do.
You are writing some initramfs or init code without a library; to create a smaller image or boot faster.
You are testing a new kernel/platform and don't want to complicate life with a full blown file system; very similar to the initramfs.
You wish to do something very quickly on program startup, but eventually want to use the libc routines.
To avoid a known bug in the libc.
The functionality is not available through libc.
Sorry, most of the examples are Linux specific, but the rationals should apply to other Unix variants. The last item is quite common when new features are introduced into a kernel. For example when kqueue or epoll where first introduced, there was no libc to support them. This may also happen if the system has an older library, but a newer kernel and you wish to use this functionality.
If your process hasn't used the libc, then most likely something in the system will have. By coding your own variants, you can negate the cache by providing two paths to the same end goal. Also, Unix
will share the code pages between processes. Generally there is no reason not to use the libc version.
Other answers have already done a stellar job on the difference between libc and system calls.

Resources