Parallel I/O with POSIX threads in C

Parallel I/O with POSIX threads in C - c

Is there a simple way in the C language, using POSIX threads, to send all the file output of a program (e.g. fprintf...) to a cpu core other than the one that is executing the code? I mean in such a way that the code keeps on flowing and does not need to wait for the file to have been written to continue.
My program does numerical integration and at every step of integration writes data to a file.
Thank you.

You can find the information regarding your question on how to set a particular thread on a particular core from:
how to set CPU affinity of a particular pthread?
However setting the 2 thread(writer thread and other thread) on different core does not mean that you does not require to synchronize between these threads. You would have to synchronize between these two threads. By setting 2 threads on different core may give you the better througput(but it depends on many other factor).

Related

How does N<->1 threading model work?

In continuation to question, This is an additional query on N-1 threading model.
It is taught that, before designing an application, selection of threading model need to be taken care.
In N-1 threading model, a single kernel thread is available to work on behalf of each user process. OS scheduler gives a single CPU time slice to this kernel thread.
In user space, programmer would use either POSIX pthread or Windows CreateThread() to spawn multiple threads within a user process. As the programmer used POSIX pthread or Windows CreateThread() the kernel is aware of the user-land threads and each thread is considered for processor time assignment by the scheduler. SO, that means every user thread will get a kernel thread.
My question:
So, How does N-1 threading model looks possible to exist? It would be 1-1 threading model. Please clarify.

In user space, programmer would use either POSIX pthread or Windows CreateThread() to spawn multiple threads within a user process. As the programmer used POSIX pthread or Windows CreateThread() the kernel is aware of the user-land threads and each thread is considered for processor time assignment by the scheduler. SO, that means every user thread will get a kernel thread.
That's how 1-to-1 threading works.
This doesn't have to be the case. A platform can implement pthread_create, CreateThread, or whatever other "create a thread" function it offers that does whatever it wants.
My question:
So, How does N-1 threading model looks possible to exist? It would be 1-1 threading model.
Please clarify.
Precisely as you explained in the beginning of your question -- when the programmer creates a thread, instead of creating a thread the kernel is aware of, it creates a thread that the userland scheduler is aware of, still using a single kernel thread for the entire process.

Short answer: there is more than Windows and Linux.
Slightly longer answer (EDITED):
Many programming languages and frameworks introduce multithreading to the programmer. At the same time, they aim to be portable, i.e., it is not known, whether any target plattform does support threads at all. Here, the best way is to implement a N:1 threading, either in general, are at least for the backends without threading support.
The classic example is Java: the language supports multithreading, while JVMs exist even for very simple embedded plattforms, that do not support threads. However, there are JVMs (actually, most of them) that use kernel threads (e.g. AFIK, the JVM by Sun/Oracle).
Another reason that a language/plattform does not want to transfer the threading control completely to the operating system are sometimes special implementation features as reactor modells or global language locks. Here, the objective is to use information on execution special patterns in the user runtime system (which does the local scheduling) that the OS scheduling has no access to.
Does [1:1 threading] add more space occupancy on User process virtual
address space because of these kernel threads?
Well, in theory, execution flow (processes, threads, etc.) and address space are independent concepts. One can find all kinds of mapping between processes (here used as a general term) and memory spaces: 1:1, n:1, 1:n, n:n. However, the classic approach of threading is that several threads of a process share the memory space of the task (that is the owner of the memory space). And thus, there is usually no difference between user threads and kernel threads regarding the memory space. (One exception is, e.g., the Erlang-VM: here, there exist user threads with isolated memory spaces).

Why does OS require/maintain kernel-land threads?

Below are three threading models that i came across.
Based on these below 3 architectures, It is new for me to understand that, there also exist something called kernel thread, apart from user thread which is introduced as part of POSIX.1C
This is 1-1 model
This is N-1 model.
This is Hybrid model.
I have been through many questions on SO for kernel threads. This looks more relevant link for clarification.
At process level, For every user process that is loaded by Linux loader(say), Kernel does not allocate corresponding kernel process for executing machine instructions that a user process has come up with. User process only request for kernel mode execution, when it require a facility from kernel module[like malloc()/fork()]. Scheduling of user process is done by OS scheduler and assign a CPU core.
For example, User process does not require kernel execution mode to execute an instruction
a=a+2;//a is my local variable in a user level C function
My question:
1)
So, What is the purpose of kernel level thread? Why does OS need to maintain a kernel thread(additionally) for corresponding user thread of a User level process? Does User mode programmer have any control on choosing any of the above three threading models for a given User process through programming?
After i understand the answer to first question, one relevant supplementary is,
2)
Does kernel thread actually get scheduled by OS scheduler but not user thread?

I think the use of the word kernel thread is a bit misleading in these figures. I know the figures from a book about operating system (design) and if I remember correctly, they refer to the way how work is scheduled by the operating system.
In the figures, each process has at least one kernel thread assigned that is scheduled by the kernel.
The N-1 model shows multiple user-land threads that are not known to the kernel at all because the latter schedules the process (or how it's called in the figure, a single kernel thread) only. So for the kernel, each process is a kernel thread. When the process is assigned a slice of processor time, it itself runs multiple threads by scheduling them at its own discretion.
In the 1-1 model, the kernel is aware of the user-land threads and each thread is considered for processor time assignment by the scheduler. So instead of scheduling a whole process, the kernel switches between threads inside of processes.
The hybrid model combines both principles, where lightweight processes are actually threads known to the kernel and which are scheduled for execution by it. Additionally, they implement threads the kernel is not aware of and assign processor time in user-land.
And now to be completely confused, there is actually a real kernel thread in Linux. But as far as I understand the concept, these threads are used for kernel-space operations only, e.g. when kernel modules need to do things in parallel.

So, What is the purpose of kernel level thread?
To provide a vehicle for assignment of a set of resources provided by the OS. The set always incudes CPU code execution on a core. Others may include disk, NIC, KB, mouse, timers, as may be requested by syscalls from the thread. The kernel manages access to those resources as they become available and arbitrates between resource conflicts, eg. a request for KB input when none is available will remove CPU execution from the thread until KB input becomes available.
Why do we need a kernel thread(additionally) for corresponding user
thread of a User level process?
Without a kernel-level thread, the user thread would not be able to obtain execution - it would be dead code/stack. Note that with Linux, the concept of threads/processes can get somewhat muddied, but nevertheless, the fundamental unit of execution is a thread. A process is a higher-level construct whose code must be run by at least one thread, (eg. the one raised by the OS loader to run code at the process entry point when it is first loaded).
Does User mode programmer have any control on choosing any of the
above three threading models for a given User process through
programming?
No, not without a syscall, which means leaving user mode.
Does kernel thread actually get scheduled by OS scheduler but not user
thread
Yes - it is the only thing that gets to be given execution when it can use it, have execution removed when it cannot, and be subject to preemptive removal of CPU if the OS scheduler requires it for something else.

What is best way to have several child processes in C

Working on a project that request to download about 300 pics from different locations by using wget every 20 minutes.
I wrote a C program that reads the database for all the Ids and locations into an array.
For each entry in the array, I call the external wget command to download it.
It works but is slow because it is doing one by one.
My thinking is to use either Multi-process, multi-thread or openMP to create several children.
Any suggestion for how to do this is appreciate.

Multiple Processes
An error in one process cannot crash another process. This is particularly useful when you will host third-party code (e.g. plugins), and this is the approach that (among others) Google Chrome takes. The disadvantage is that N processes use more system resources than N threads.
Multiple Threads
Uses fewer system resources than an equivalent number of processes. Thread programming is more error prone for many developers, and an error in one thread can affect other threads.
Best Option
For what you are doing, you are unlikely to see a significant difference in resource utilization. Use whichever model you can write fast in high quality.

Personally I would go for multi process. The wget's do not need to share any memory or communicate (other than an exit status which is only needed by the root) so a thread will not provide any additional benefit (in my opinion). As well as this creating them as processed allows the OS scheduler to best decide when to run each process.

Is there a way to lock a process to a CPU?

I am thinking about developing an application that will, on a six core machine, run six asynchronous tasks, one on each core.
But is it possible to lock the tasks to their own core?
The idea is for them to run mostly by themselves, but to sometimes communicate over a shared memory area. However, I want the tasks to run as undisturbed as possible.

The concept you're looking for is called "thread affinity". How it's implemented and the interfaces to ask for it are OS-specific.
Under Linux, try sched_setaffinity(). glibc may also offer pthread_attr_setaffinity_np().

taskset -c cpunum yourprocess
does what you want.
It is possible to supply PIDs instead, this way you can set single threads to a cpu. If you want to change the cpu affinity from your own program, use sched_setaffinity().

Not lock, but it is possible to associate a cpu affinity for a process

Just for the records, another method, not involving programming:
Open Task Manager, go to Processes tab, right click your process and choose Set Affinity...

Parallel Threads in C

I have two threads in my application. Is it possible to execute both the threads simultaneously without sleeping any thread?

You can run the threads parallel in your application especially if they are not waiting on each other for some inputs or conditions. For example: One thread may be parsing a file and other maybe playing a song in your application.
Generally OS takes care of the thread time slicing. So at the application level it would look like these threads are running parallel but the OS does the time slicing giving each thread certain execution time.
With multi-core processors/cores it is possible to run the threads parallel in realtime, however the OS decides which threads to run unless you specifically code at lower level to ensure which threads you want to run in parallel.

As others have mentioned, with multiple cores it is possible, but, it depends on how the OS decides to distribute the threads. You don't have any control, that I have seen, on dictating where each thread is ran.
For a really good tutorial, with some nice explanation and pictures you can look at this page, with code as to how to do multi-threading using the POSIX library.
http://www.pathcom.com/~vadco/parallel.html
The time slice for sleep is hard to see, so your best bet is to test it out, for example, have your two threads begin to count every millisecond, and see if the two are identical. If they are not, then at least one is going to sleep by the cpu.
Most likely both will go to sleep at some time, the test is to see how much of a difference there is between the two threads.
Once one thread blocks, either waiting to send data, or waiting to receive, it will be put to sleep so that other threads can run, so that the OS can continue to make certain everything is working properly.

C does not, itself, have any means to do multi-threaded code.
However, POSIX has libraries that allow you to work with threads in C.
One good article about this topic is How to write multi-threaded software in C and C++.

Yes, if you have multiple processors or multi-core processors. One thread will run in one core.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight