Usage PPM on top of Aerys - amphp

There is "Don’t use any blocking I/O functions in Aerys."
warning at https://amphp.org/aerys/io#blocking-io. Should I use PPM instead of Aerys if I need usage of PDO (e.g., Prooph components) and want to reuse initialized application instance for handling different requests?
I'm not bound to any existent PPM adapter (e.g., Symfony). Is there a way to reuse Aerys code (e.g., Router) for request-response logic when using PPM on top of Aerys (https://github.com/php-pm/php-pm/pull/267)?

You can just increase the worker count using the -w switch for the command line script to be higher if you want to use blocking functions. It's definitely not optimal, but with enough workers the blocking shouldn't be too noticeable, except for an increasing latency, which might occur.
Another possibility is to move the blocking calls into one or multiple worker threads with amphp/parallel.
As long as the responses are relatively fast everything should be fine. The issue begins if there's a lot of load and things get slower and might time out, because these are very long blocks then.
PHP-PM doesn't offer too much benefit over using Aerys directly. It redirects requests to a currently free worker, but with high enough load the kernel load balancing will probably good enough and not all requests that take longer will be routed to one worker. In fact, using Aerys will probably be better, because it's production ready and has multiple independent workers instead of one master that might be a bottleneck. PHP-PM could solve that in a better way, but it's currently not implemented. Additionally, Aerys supports keep-alive connections, which PHP-PM does currently not support.

Related

Does sql queries block node.js event loop?

I'm going to create a web-api using pure node.js that do CRUD operations on SQL Server and return results to clients. The queries are almost long running (around 3 seconds) and request per second is high (around 30 rps). I'm using mssql package with a call back function to return result once it's ready.
I've already read a lot about node and I know its quite fits for IO intensive not CPU intensive apps and also event loop shouldn't be blocked because it's single threaded...
My question: Is Node.js suitable for this (SQL intensive) scenario? Is there any performance issue to use Node.js for this case?
Thanks
Node.js has gone all-in on non-blocking code to the degree that pretty much any function that's blocking in the Node.js API will be labelled as Sync.
Every database driver I've seen follows the model of requiring callbacks, using Promises, or in some cases both.
As a Node.js developer you must read the documentation carefully to look for any potentially blocking calls, and need to employ the correct concurrency method to handle asynchronous operations. Normally you don't need to overly concern yourself with the details of how long any given operation is, but you should be still be careful when doing things that are slow. Process data in smaller chunks (e.g. row by row) instead of all at once.
Even though it's just single threaded, Node.js can perform very well under load because it's very quick to switch between asynchronous operations. You can also scale up by having multiple Node.js processes working in parallel quite easily, especially if you're using a message bus or HTTP-type fan-out through a load balancer.

Lustre file locking for concurrent access

I'm trying to develop an application that will be running on multiple computers linked to a shared Lustre storage, performing various actions, including but not limited to:
Appending data to a file.
Reading data from a file.
Reading from and writing to a file, modifying all of its content pass a certain offset.
Reading from and writing to a file, modifying its content at a specific offset.
As you can see, the basic I/O one can wish for.
Since it's concurrent for most of that, I ought to need some kind of locking to allow safely doing the different writings, but I've seen Lustre doesn't support flock(2)s by default (and I'm not sure I want to use it over fcntl(2), I guess I will if it comes to it), and I haven't seen anything about fcntl(2) to confirm its support.
Researching it mostly resulted in me reading lot of papers about I/O optimization using Lustre, but those usually explain how the structure of their hardware / software / network works rather than explaining how it's done in the code.
So, can I use fcntl(2) with Lustre? Should I use it? If not, what are other alternatives to allow different clients to perform concurrent modifications of the data?
Or is it even possible ? (I've seen in Lustre tickets that mmap is possible, so fcntl should work too (no logic behind statement), but there might be limitations I would want to be aware of.)
I'll keep on writing a test application to check it out, but I figured I should still ask in case there are better alternatives (or if there are limitations to its functionalities that I should be aware of, since my test will be limited and we don't want unknown limitations to become an issue later in the development process).
Thanks,
Edit: The base question has been properly answered by LustreOne, here I give more specific informations about my use case to allow people to add pertinent additional informations about Lustre concurrent access.
The Lustre clients will be server to other applications.
Clients of those applications will each have their own set of files, but we want to support allowing clients to log to their client space from multiple machines at the same time and, for that purpose, we need to allow concurrent file read and write.
These, however, will always be a pretty small percentage of total I/O operations.
While really interesting insights were given in LustreOne's answer, not many of them apply to this use case (or rather, they do apply, but adding the complexity to the overall system might not be desired for the impact on performances).
That is, for the use case considered at present, I'm sure it can be of much help to some, and ourselves later on. However, what we are seeking right now is more of a way to easily allow two nodes or threads a node responding to two request to modify data to let one pass and detect the conflict, effectively preventing concerned client.
I believed file locking would be enough for that use case, but had a preference for byte locking since some of the most concerned file are getting appended non-stop by some clients, and read/modified up to the end by others.
However, judging from what I understood from LustreOne's answer:
That said, there is no strict requirement for this if your application
knows what it is doing. Lustre will already keep non-overlapping
writes consistent, and can handle concurrent O_APPEND writes as well.
The later case is already managed by Lustre out of the box.
Any opinion on what could be the best alternatives ? Will using simple flock() on complete file be enough ?
Note that some file will also have index, which can be used to determine availability of data without locking any of the data file, shall that be used or are bytes lock quick enough for us to avoid increasing codebase size to support both case?
A final mention on mmap. I'm pretty sure it doesn't fit our use case much since we got so many files and many clients, so OST might not be able to cache much, but to be sure... shall it be used, and if so, how? ^^
Sorry for being so verbose, it's one of my bad traits. :/
Have a nice day,
You should mount all clients with the "-o flock" mount option to enable globally coherent locking. Then flock() (and I think fcntl() locking) will work.
That said, there is no strict requirement for this if your application knows what it is doing. Lustre will already keep non-overlapping writes consistent, and can handle concurrent O_APPEND writes as well. However, since Lustre has to do internal locking for appends, this can hurt write performance significantly if there are a lot of different clients appending to the same file concurrently. (Note this is not a problem if only a single client is appending).
If you are writing the application yourself, then there are a lot of things you can do to make performance better:
- have some central thread assign a "write slot number" to each writer (essentially an incrementing integer), and then the client writes to offset = recordsize * slot number. Beyond assigning the slot number (which could be done in batches for better performance), there is no contention between clients. In most HPC applications the threads use the MPI rank as the slot number, since it is unique, and threads on the same node will typically be assigned adjacent slots so Lustre can further aggregate the writes. That doesn't work if you use a producer/consumer model where threads may produce variable numbers of records.
- make the IO recordsize a multiple of 4KiB in size to avoid contention between threads. Otherwise, the clients or servers will be forced to do read-modify-write for the partial records in a disk block, which is inefficient.
- Depending on whether your workflow allows it or not, rather than doing read and write into the same file, it will probably be more efficient to write a bunch of records into one file, then process the file as a whole and write into a second file. Not that Lustre can't do concurrent read and write to a single file, but this causes unnecessary contention that could be avoided.

Parallel calls to google.appengine.api.channel.send_message

I am using send_message(client_id, message) in google.appengine.api.channel to fan out messages. The most common use case is two users. A typical trace looks like the following:
The two calls to send_message are independent. Can I perform them in parallel to save latency?
Well there's no async api available, so you might need to implement a custom solution.
Have you already tried with native threading? It could work in theory, but because of the GIL, the xmpp api must block by I/O, which I'm not sure it does.
A custom implementation will invariably come with some overhead, so it might not be the best idea for your simple case, unless it breaks the experience for the >2 user cases.
There is, however, another problem that might make it worth your while: what happens if the instance crashes and only got to send the first message? The api isn't transactional, so you should have some kind of safeguard. Maybe a simple recovery mode will suffice, given how infrequently this will happen, but I'm willing to bet a transactional message channel sounds more appealing, right?
Two ways you could go about it, off the top of my head:
Push a task for every message, they're transactional and guaranteed to run, and will execute in parallel with a fairly identical run time. It'll increase the time it takes for the first message to go out but will keep it consistent between all of them.
Use a service built for this exact use case, like firebase (though it might even be too powerful lol), in my experience the channel api is not very consistent and the performance is underwhelming for gaming, so this might make your system even better.
Fixed that for you
I just posted a patch on googleappengine issue 9157, adding:
channel.send_message_async for asynchronously sending a message to a recipient.
channel.send_message_multi_async for asynchronously broadcasting a single message to multiple recipients.
Some helper methods to make those possible.
Until the patch is pushed into the SDK, you'll need to include the channel_async.py file (that's attached on that thread).
Usage
import channel_async as channel
# this is synchronous D:
channel.send_message(<client-id>, <message>)
# this is asynchronous :D
channel.send_message_async(<client-id>, <message>)
# this is good for broadcasting a single message to multiple recipients
channel.send_message_multi_async([<client-id>, <client-id>], <message>)
# or
channel.send_message_multi_async(<list-of-client-ids>, <message>)
Benefits
Speed comparison on production:
Synchronous model: 2 - 80 ms per recipient (and blocking -.-)
Asynchronous model: 0.15 - 0.25 ms per recipient

When its better to write singlethreaded event servers, etc

When its better to write single threaded event or multithreaded or fork servers? Which approach more approriate for:
web server who serves statics
web server who serves statics and proxy requests to other http servers
previous plus this servers makes some logic in C without attending to harddrive or network
previous plus makes some requests to MySQL?
For example, the language is C/C++. I want to undestand this question very much. Thank you.
Given that modern processors are multicore, you should always write code under the assumption that it may at some point be multithreaded (make things reentrant wherever possible, and where not possible, use interfaces that would make it easy to drop-in an implementation that does the necessary locking).
If you have a fairly small number of queries-per-second (qps), then you might be able to get away with a single threaded event server. I would still recommend writing the code in a way that allows for multithreading, though, in the event that you increase the qps and need to handle more events.
A single threaded server can be scaled up by converting it into a forked server; however, forked servers use more resources -- any memory that is common to all requests ends up duplicated in memory when using forked servers, whereas only one copy of this data is required in a single multithreaded server. Also, a multithreaded server can deliver lower latency if it is taking advantage of true parallelism and can actually perform multiple parts of a single request processing in parallel.
There is a nice discussion on Single Thread vs. MultiThreaded server on joelonsoftware,

Implementing multithreaded application under C

I am implementing a small database like MySQL.. Its a part of a larger project..
Right now i have designed the core database, by which i mean i have implemented a parser and i can now execute some basic sql queries on my database.. it can store, update, delete and retrieve data from files.. As of now its fine.. however i want to implement this on network..
I want more than one user to be able to access my database server and execute queries on it at the same time... I am working under Linux so there is no issue of portability right now..
I know i need to use Sockets which is fine.. I also know that i need to use a concept like Thread Pool where i will be required to create a maximum number of threads initially and then for each client request wake up a thread and assign it to the client..
As for now what i am unable to figure out is how all this is actually going to be bundled together.. Where should i implement multithreading.. on client side / server side.? how is my parser going to be configured to take input from each of the clients separately?(mostly via files i think?)
If anyone has idea about how i can implement this pls do tell me bcos i am stuck here in this project...
Thanks.. :)
If you haven't already, take a look at Beej's Guide to Network Programming to get your hands dirty in some socket programming.
Next I would take his example of a stream client and server and just use that as a single threaded query system. Once you have got this down, you'll need to choose if you're going to actually use threads or use select(). My gut says your on disk database doesn't yet support parallel writes (maybe reads), so likely a single server thread servicing requests is your best bet for starters!
In the multiple client model, you could use a simple per-socket hashtable of client information and return any results immediately when you process their query. Once you get into threading with the networking and db queries, it can get pretty complicated. So work up from the single client, add polling for multiple clients, and then start reading up on and tackling threaded (probably with pthreads) client-server models.
Server side, as it is the only person who can understand the information. You need to design locks or come up with your own model to make sure that the modification/editing doesn't affect those getting served.
As an alternative to multithreading, you might consider event-based single threaded approach (e.g. using poll or epoll). An example of a very fast (non-SQL) database which uses exactly this approach is redis.
This design has two obvious disadvantages: you only ever use a single CPU core, and a lengthy query will block other clients for a noticeable time. However, if queries are reasonably fast, nobody will notice.
On the other hand, the single thread design has the advantage of automatically serializing requests. There are no ambiguities, no locking needs. No write can come in between a read (or another write), it just can't happen.
If you don't have something like a robust, working MVCC built into your database (or are at least working on it), knowing that you need not worry can be a huge advantage. Concurrent reads are not so much an issue, but concurrent reads and writes are.
Alternatively, you might consider doing the input/output and syntax checking in one thread, and running the actual queries in another (query passed via a queue). That, too, will remove the synchronisation woes, and it will at least offer some latency hiding and some multi-core.

Resources