A way to accept a network connection while running other tasks? - c

I am building a server application that is supposed to do text processing in the background but it's task changes based on signals from a client application. My problem is that I can't do the programs primary job while waiting for connections. Is there anyway to run this job at the same time? I have looked at multithreading, however because the application is supposed to maintain an internal state while running I can't work out how make it function in this way. The program is written in C.

If you have to maintain internal state that all threads need access to, you need synchronization. Every thread comes with its own stack, but they all share the heap. If you access an object on the thread, you need to make sure your thread obtains a lock on that state (possibly wait until you can get it) and then changes the state, releases the lock and so on.
The common way to do this on POSIX systems is the pthread API. C11 has added standardized threading support to the language which can be found in the header threads.h, but support for it is very rare.
Alternatively, you can also use multiple processes. That would change how you communicate between threads but the general model of your application would remain the same.

Related

Multithreaded reads and writes to a single SQlite database using the C API

My application consists of a number of threads (typically 5-10) and each is responsible for reading a value from an SQLite database, working on it for an amount of time and then writing a new value back to the database.
Each of the threads is running by itself without any synchronization.
My question is: Do I need to write the synchronization code myself or it is possible to interact with the SQLite C API in such a way that this gets taken care of for you. IE. If there is a transaction in progress to write a value and a different thread tries to write or read the same row, SQLite will block until it's okay to do so?
Do I need to write the synchronization code myself or it is possible to interact with the SQLite C API in such a way that this gets taken care of for you.
The SQLite documentation covers this. In a nutshell, SQLite has three different threading models available:
single-thread, in which all internal mutexes are disabled, and the SQLite API cannot safely be used by multiple threads at the same time;
multi-thread, in which the library can safely be used concurrently by multiple threads, but database connections can be used only by a single thread at a time; and
serialized, in which multiple threads can use the API concurrently without restriction.
There is a built-in default selected at compile time (which is "serialized" mode in a standard build). A different mode can be selected during library initialization (sqlite3_config()), and a per-connection thread mode can be specified when you open a new connection, except that single-thread mode cannot be overridden.
But note well that under all circumstances, SQLite provides a single transaction at a time per connection. Thus, you may need your own synchronization even in "serialized" mode.
If there is a transaction in progress to write a value and a different thread tries to write or read the same row, SQLite will block until it's okay to do so?
If an SQLite connection is in serialized mode and has autocommit enabled, then you're fine, in the sense that each statement will be executed in its own transaction, and different threads will not interfere with each other (but they may counteract each other). If a connection is instead in the weaker "multi-thread" mode then you must provide your own synchronization, so that different threads do not attempt to use the connection concurrently. If autocommit is disabled, then you will probably need to synchronize even in "serialized" mode to accommodate multiple threads using the same connection, else you will be unable to effectively control transaction boundaries and the contents of each transaction.
With respect to a single, established connection, there is no meaningful difference between "multi-thread" mode and "single-thread" mode.
If there is a transaction in progress to write a value and a different thread tries to write or read the same row, SQLite will block until it's okay to do so?
The easy way to do this is use mult-threaded mode, with one connection object per thread. Then threads can acquire a lock on the database with a BEGIN IMMEDIATE transaction, and if you get a SQLITE_BUSY error, the thread can either do something else for a while before trying again, or if you had set up a busy timeout ahead of time to a reasonable for your needs timeout, the thread will just sleep for up to that length of time trying to acquire the transaction lock periodically before giving up.
If you use serialized mode and a single connection for the entire program, you have to include all the logic and locking to make sure that only one particular thread can access the database when a transaction is active yourself. Much easier to use sqlite's native, well tested, support for that functionality.

Erlang spawning large amounts of C processes

I've been looking into how I could embed languages (let's use Lua as an example) in Erlang. This of course isn't a new idea and there are many libraries out there that can do this. However I was wondering if it was possible to start a Genserver with state which is modified by Lua. This means that once you start the Genserver, it will start a (long running) Lua process to manipulate the Genserver's state. I know this is possible as well, but I was wondering if I could spawn 1,000 10,000 or even 100,000 of these processes.
I'm not really familiar with this topic but I have done some research.
(Please correct me if I'm wrong on any of these options).
TLDR; Skip to the last paragraph.
First option: NIFs:
This doesn't seem like an option since it will block the Erlang Scheduler of the current process. If I want to spawn a large amount of these it will freeze the entire runtime.
Second option: Port Driver:
It's like a NIF but communicates by sending data to a specified port, which can also send data back to Erlang. This is nice although this also seems to block the scheduler. I've tried a library which does the boiler plat for you as well, but that seemed to block the scheduler after spawning 10 processes. I've also looked into the postgresql example on the Erlang Documentation which is said to be async but I couldn't get the example code to work (R13?). Is it even possible to run as many Port Driver processes without blocking the runtime?
Third option: C Nodes:
I thought this was very interesting and wanted to try it out, but apparently the project "erlang-lua" already does this. It's nice because it won't crash your Erlang VM if something goes wrong and the processes are isolated. But in order to actually spawn a single process you need to spawn an entire node. I have no idea how expensive this is. Nor am I sure what the limit is for connecting nodes in a cluster, but I don't see myself spawning 100,000 C nodes.
Fourth option: Ports:
At first I thought this was the same as a Port Driver but it's actually different. You spawn a process which executes an application and communicates through STDIN and STDOUT. This would work well for spawning a large amount of processes, and (I think?) they aren't a threat to the Erlang VM. But if I'm going to communicate through STDIN / STDOUT, why even bother with an embeddable language to begin with? Might as well use any other scripting language.
And so after much research in a field I'm not familiar with I've come to this. You could a Genserver as an "entity" where the AI is written in Lua. Which is why I'd like to have a processes for each entity. My question is how do I achieve spawning many Genservers which communicate with long running Lua processes? Is this even possible? Should I be tackling my problem differently?
If you can make the Lua code — or more accurately, its underlying native code — cooperate with the Erlang VM, you have a few choices.
Consider one of the most important functions of the Erlang VM: managing the execution of a (potentially large number of) Erlang's lightweight processes across a relatively small set of scheduler threads. It uses several techniques to know when a process has used up its timeslice or is waiting and so should be scheduled out to give another process a chance to run.
You seem to be asking how you can get native code to run however it likes within the VM, but as you've already hinted, the reason native code can cause problems for the VM is that it has no practical way to stop the native code from completely taking over a scheduler thread and thus preventing regular Erlang processes from executing. Because of this, native code has to cooperatively yield the scheduler thread back to the VM.
For older NIFs, the choices for such cooperation were:
Keep the amount of time NIF calls ran on a scheduler thread to 1ms or less.
Create one or more private threads. Transition each long-running NIF call from its scheduler thread over to a private thread for execution, then return the scheduler thread to the VM.
The problems here are that not all calls can complete in 1ms or less, and that managing private threads can be error-prone. To get around the first problem, some developers would break the work down into chunks and use an Erlang function as a wrapper to manage a series of short NIF calls, each of which completed one chunk of work. As for the second problem, well, sometimes you just can't avoid it, despite its inherent difficulty.
NIFs running on Erlang 17.3 or later can also cooperatively yield the scheduler thread using the enif_schedule_nif function. To use this feature, the native code has to be able to do its work in chunks such that each chunk can complete within the usual 1ms NIF execution window, similar to the approach mentioned earlier but without the need to artificially return to an Erlang wrapper. My bitwise example code provides many details about this.
Erlang 17 also brought an experimental feature, off by default, called dirty schedulers. This is a set of VM schedulers that do not have the same native code execution time constraints as the regular schedulers; work there can block for essentially infinite periods without disrupting normal VM operation.
Dirty schedulers come in two flavors: CPU schedulers for CPU-bound work, and I/O schedulers for I/O-bound work. In a VM compiled to enable dirty schedulers, there are by default as many dirty CPU schedulers as there are regular schedulers, and there are 10 I/O schedulers. These numbers can be altered using command-line switches, but note that to try to prevent regular scheduler starvation, you can never have more dirty CPU schedulers than regular schedulers. Applications use the same enif_schedule_nif function mentioned earlier to execute NIFs on dirty schedulers. My bitwise example code provides many details about this too. Dirty schedulers will remain an experimental feature for Erlang 18 as well.
Native code in linked-in port drivers is subject to the same on-scheduler execution time constraints as NIFs, but drivers have two features NIFs don't:
Driver code can register file descriptors into the VM polling subsystem and be notified when any of those file descriptors becomes I/O-ready.
The driver API supports access to a non-scheduler async thread pool, the size of which is configurable but by default has 10 threads.
The first feature allows native driver code to avoid blocking a thread for I/O. For example, instead of performing a blocking recv call, driver code can register the socket file descriptor so the VM can poll it and call the driver back when the file descriptor becomes readable.
The second feature provides a separate thread pool useful for driver tasks that can't conform to the scheduler thread native code execution time constraints. You can achieve the same in a NIF but you have to set up your own thread pool and write your own native code to manage and access it. But regardless of whether you use the driver async thread pool, your own NIF thread pool, or dirty schedulers, note that they are all regular operating system threads, and so trying to start a huge number of them simply isn't practical.
Native driver code does not yet have dirty scheduler access, but this work is on-going and it might become available as an experimental feature in an 18.x release.
If your Lua code can make use of one or more of these features to cooperate with the Erlang VM, then what you're attempting may be possible.

c linux multithreading networking

I have a network application on a gateway. It receives and sends packets. For most of them, my gateway acts as a router, but in some cases, it can receive packets too.
Should I have:
only one main thread
a main thread + a dispatch thread in charge of giving it to the correct flow handler
as many threads as there are flows
something else.
?
Doing multithreading correctly is no simple matter, in many cases a select and friends based solution will be a whole lot easier to create.
Your case sounds a lot like a typical Unix service daemon. The popular solution to your problem is not to use threads, but forks.
The idea is that your program listens on the socket and waits for connections. As soon as a connection arrives, it forks. The child process then continues to process the connection. The father process itself just continues in the loop and waits for incoming connections.
Advantages over threading:
Very simple program design
No problems with concurrency
Established method for Unix/Linux systems
Disadvantages:
Things get complicated when several connections interact with each other (your use case doesn't sound like they would)
Performance penalty on Windows systems (not on Unix systems!)
You can find many code examples online.
I don't know much about networking applications, but I think it's like this:
If you have the ability to react asynchronous to the requests you would probably use just one single thread (like in Node.JS). If you won't be able to react asynchronous the main thread would always block the other actions.
If you are not able to react asynchronous on your requests you have to use more than one thread. But you could achieve that in many different ways: you could create for every request a thread, or a limited number of threads and assign them then to your requests.
My personal preference is use one main thread and one worker thread per connection. No cap whatsoever. I am assuming that your server will be stateless like a HTTP server.
For stateful servers you will have to figure out some way to control number of threads.

What have you used sysv/posix message queues for?

I've never seen any project or anything utilizing posix or sysv message queues - and being curious, what problems or projects have you guys used them for ?
I had a series of commands that needed to be executed in order, but the main program flow did not depend on their completion so I queued them up and passed them to another process via a System V message queue to be executed independently of the main program. Since message queues provide an asynchronous communications protocol, they were a good fit for this task.
To be honest, I used System V message queues because I had never used them before and I wanted to. I'm sure there are other IPC methods I could have used.
It's been a while since I've done any real VxWorks programming, but you can also find message queues used in VxWorks applications. According to the VxWorks Application Programmer's Guide (Google search), the primary intertask communication mechanism within a single CPU is message queues. VxWorks uses two message queue subroutine libraries (POSIX and VxWorks).
I once wrote a text-mode I/O generator utility that had one thread in charge of updating the UI and a number of worker threads to do the actual I/O work. When a worker thread completed an I/O, it sent an update message to the UI thread. I implemented this message system using a POSIX message queue.
Why implement it like this? It sounded like a good idea at the time, and I was curious about how they worked. I figured I could solve the problem and learn something at the same time. There were many different techniques I could have used, and I don't suppose there was any profound reason why I chose this technique. I didn't realize it until later, but I was glad I used a POSIX queue when I had to port the utility to another system (it was also POSIX compliant, so I didn't have to worry about porting external libraries to get my app to run).
You can use it for IPC for sure because it is an IPC mechanism. With this mechanism you can write multi-process event processing applications in which all of the applications are using the queue and each of which are waiting for a special type of message (an special event to occur). When the message arrives that process takes the message, processes that and puts the result back into the queue so that the other process can use it.
Once i wrote such an application using message queues. It is pretty easy to work with and does not need Inter-process synchronization mechanisms such as semaphores. You can use it in place of Shared Memory of Memory Mapped files as well, in situations in which all you need is just sending a structure or some kind of packed data to other processes Message Queues are far easier to use than any other IPC mechanism.
This book contains all information you need to know about Message Queues and other IPC mechanisms in Linux.

Cleanest way to stop a process on Win32?

While implementing an applicative server and its client-side libraries in C++, I am having trouble finding a clean and reliable way to stop client processes on server shutdown on Windows.
Assuming the server and its clients run under the same user, the requirements are:
the solution should work in the following cases:
clients may each feature either a console or a gui.
user may be unprivileged.
clients may be or become unresponsive (infinite loop, deadlock).
clients may or may not be children of the server (direct or indirect).
unless prevented by a client-side defect, clients shall be allowed the opportunity to exit cleanly (free their ressources, sync some data to disk...) and some reasonable time to do so.
all client return codes shall be made available (if possible) to the server during the shutdown procedure.
server shall wait until all clients are gone.
As of this edit, the majority of the answers below advocate the use of a shared memory (or another IPC mechanism) between the server and its clients to convey shutdown orders and client status. These solutions would work, but require that clients successfully initialize the library.
What I did not say, is that the server is also used to start the clients and in some cases other programs/scripts which don't use the client library at all. A solution that did not rely on a graceful communication between server and clients would be nicer (if possible).
Some time ago, I stumbled upon a C snippet (in the MSDN I believe) that did the following:
start a thread via CreateRemoteThread in the process to shutdown.
had that thread directly call ExitProcess.
Unfortunately now that I'm looking for it, I'm unable to find it and the search results seem to imply that this trick does not work anymore on Vista. Any expert input on this ?
If you use thread, a simple solution is to use a named system event, the thread sleeps on the event waiting for it to be signaled, the control application can signal the event when it wants the client applications to quit.
For the UI application it (the thread) can post a message to the main window, WM_ CLOSE or QUIT I forget which, in the console application it can issue a CTRL-C or if the main console code loops it can check some exit condition set by the thread.
Either way rather than finding the client applications an telling them to quit, use the OS to signal they should quit. The sleeping thread will use virtually no CPU footprint provided it uses WaitForSingleObject to sleep on.
You want some sort of IPC between clients and servers. If all clients were children, I think pipes would have been easiest; since they're not, I guess a server-operated shared-memory segment can be used to register clients, issue the shutdown command, and collect return codes posted there by clients successfully shutting down.
In this shared-memory area, clients put their process IDs, so that the server can forcefully kill any unresponsive clients (modulo server privileges), using TerminateProcess().
If you are willing to go the IPC route, make the normal communication between client and server bi-directional to let the server ask the clients to shut down. Or, failing that, have the clients poll. Or as the last resort, the clients should be instructed to exit when the make a request to server. You can let the library user register an exit callback, but the best way I know of is to simply call "exit" in the client library when the client is told to shut down. If the client gets stuck in shutdown code, the server needs to be able to work around it by ignoring that client's data structures and connection.
Use PostMessage or a named event.
Re: PostMessage -- applications other than GUIs, as well as threads other than the GUI thread, can have message loops and it's very useful for stuff like this. (In fact COM uses message loops under the hood.) I've done it before with ATL but am a little rusty with that.
If you want to be robust to malicious attacks from "bad" processes, include a private key shared by client/server as one of the parameters in the message.
The named event approach is probably simpler; use CreateEvent with a name that is a secret shared by the client/server, and have the appropriate app check the status of the event (e.g. WaitForSingleObject with a timeout of 0) within its main loop to determine whether to shut down.
That's a very general question, and there are some inconsistencies.
While it is a not 100% rule, most console applications run to completion, whereas GUI applications run until the user terminates them (And services run until stopped via the SCM). Hence, it's easier to request a GUI to close. You send them the equivalent of Alt-F4. But for a console program, you have to send them the equivalent of Ctrl-C and hope they handle it. In both cases, you simply wait. If the process sticks around, you then shoot it down (TerminateProcess) and pray that the damage is limited. But your HDD can fill up with temporary files.
GUI application in general do not have exit codes - where would they go? And a console process that is forcefully terminated by definition does not exit, so it has no exit code. So, in a server shutdown scenario, don't expect exit codes.
If you've got a debugger attached, you generally can't shutdown the process from another application. That would make it impossible for debuggers to debug exit code!

Resources