I want to use the libcurl library to post data to 4 urls simultaneously, every 30-120 seconds or so.
What is faster is this case, using libcurl_easy manually or using libcurl_multi ? The doc is very sparse and I haven't found a real answer anywhere. I just want to know which would be faster, doesn't matter if its by a very small margin.
Also, I know libcurl handles have keepalive as long as I don't reset them, so in my case they will no time out inbetween requests ?
edit : I realise this seems illogical to optimise if I run every few seconds, but when I post it has to be as fast as possible.
There's really no speed difference between the easy and the multi interface. The easy interface is actually internally implemented as a wrapper around the multi interface so eventually they're running the same code anyway.
The multi interface offers a non-blocking API for doing many transfers in parallel. If you just want to do a single request in a synchronous fashion, there's really no reason not to just go with the easiest: the easy interface.
Related
There is "Don’t use any blocking I/O functions in Aerys."
warning at https://amphp.org/aerys/io#blocking-io. Should I use PPM instead of Aerys if I need usage of PDO (e.g., Prooph components) and want to reuse initialized application instance for handling different requests?
I'm not bound to any existent PPM adapter (e.g., Symfony). Is there a way to reuse Aerys code (e.g., Router) for request-response logic when using PPM on top of Aerys (https://github.com/php-pm/php-pm/pull/267)?
You can just increase the worker count using the -w switch for the command line script to be higher if you want to use blocking functions. It's definitely not optimal, but with enough workers the blocking shouldn't be too noticeable, except for an increasing latency, which might occur.
Another possibility is to move the blocking calls into one or multiple worker threads with amphp/parallel.
As long as the responses are relatively fast everything should be fine. The issue begins if there's a lot of load and things get slower and might time out, because these are very long blocks then.
PHP-PM doesn't offer too much benefit over using Aerys directly. It redirects requests to a currently free worker, but with high enough load the kernel load balancing will probably good enough and not all requests that take longer will be routed to one worker. In fact, using Aerys will probably be better, because it's production ready and has multiple independent workers instead of one master that might be a bottleneck. PHP-PM could solve that in a better way, but it's currently not implemented. Additionally, Aerys supports keep-alive connections, which PHP-PM does currently not support.
I am using send_message(client_id, message) in google.appengine.api.channel to fan out messages. The most common use case is two users. A typical trace looks like the following:
The two calls to send_message are independent. Can I perform them in parallel to save latency?
Well there's no async api available, so you might need to implement a custom solution.
Have you already tried with native threading? It could work in theory, but because of the GIL, the xmpp api must block by I/O, which I'm not sure it does.
A custom implementation will invariably come with some overhead, so it might not be the best idea for your simple case, unless it breaks the experience for the >2 user cases.
There is, however, another problem that might make it worth your while: what happens if the instance crashes and only got to send the first message? The api isn't transactional, so you should have some kind of safeguard. Maybe a simple recovery mode will suffice, given how infrequently this will happen, but I'm willing to bet a transactional message channel sounds more appealing, right?
Two ways you could go about it, off the top of my head:
Push a task for every message, they're transactional and guaranteed to run, and will execute in parallel with a fairly identical run time. It'll increase the time it takes for the first message to go out but will keep it consistent between all of them.
Use a service built for this exact use case, like firebase (though it might even be too powerful lol), in my experience the channel api is not very consistent and the performance is underwhelming for gaming, so this might make your system even better.
Fixed that for you
I just posted a patch on googleappengine issue 9157, adding:
channel.send_message_async for asynchronously sending a message to a recipient.
channel.send_message_multi_async for asynchronously broadcasting a single message to multiple recipients.
Some helper methods to make those possible.
Until the patch is pushed into the SDK, you'll need to include the channel_async.py file (that's attached on that thread).
Usage
import channel_async as channel
# this is synchronous D:
channel.send_message(<client-id>, <message>)
# this is asynchronous :D
channel.send_message_async(<client-id>, <message>)
# this is good for broadcasting a single message to multiple recipients
channel.send_message_multi_async([<client-id>, <client-id>], <message>)
# or
channel.send_message_multi_async(<list-of-client-ids>, <message>)
Benefits
Speed comparison on production:
Synchronous model: 2 - 80 ms per recipient (and blocking -.-)
Asynchronous model: 0.15 - 0.25 ms per recipient
There is appengine-mapreduce which seems the official way to do things on AppEngine. But there seems no documentation besides some hacked together Wiki Pages and lengthy videos. There are statements that the lib only supports the map step. But the source indicates that there are also implementations for shuffle.
A Version of this appengine-mapreduce library seems also to be included in the SDK but it not blessed for public use. So you basically are expected to load the library twice into your runtime.
Then there is appengine-pipeline. "A primary use-case of the API is connecting together various App Engine MapReduces into a computational pipeline." But there also seems pipeline-related code in the appengine-mapreduce library.
So where do I start to find out how this all fits together? Which is the library to call from my project. Is there any decent documentation on appengine-mapreduce besides parsing change logs?
Which is the library to call from my project.
They serve different purposes, and you've provided no details about what you're attempting to do.
The most fundamental layer here is the task queue, which lets you schedule background work that can be highly parallelized. This is fan-out. Let's say you had a list of 1000 websites, and you wanted to check the response time for each one and send an email for any site that takes more than 5 seconds to load. By running these as concurrent tasks, you can complete the work much faster than if you checked all 1000 sites in sequence.
Now let's say you don't want to send an email for every slow site, you just want to check all 1000 sites and send one summary email that says how many took more than 5 seconds and how many took fewer. This is fan-in. It's trickier with the task queue, because you need to know when all tasks have completed, and you need to collect and summarize their results.
Enter the Pipeline API. The Pipeline API abstracts the task queue to make fan-in easier. You write what looks like synchronous, procedural code, but uses Python futures and is executed (as much as possible) in parallel. The Pipeline API keeps track of task dependencies and collects results to facilitate building distributed workflows.
The MapReduce API wraps the Pipeline API to facilitate a specific type of distributed workflow: mapping the results of a piece of work into a set of key/value pairs, and reducing multiple sets of results to one by combining their values.
So they provide increasing layers of abstraction and convenience around a common system of distributed task execution. The right solution depends on what you're trying to accomplish.
There is offical documentation here: https://developers.google.com/appengine/docs/java/dataprocessing/
I am implementing a small database like MySQL.. Its a part of a larger project..
Right now i have designed the core database, by which i mean i have implemented a parser and i can now execute some basic sql queries on my database.. it can store, update, delete and retrieve data from files.. As of now its fine.. however i want to implement this on network..
I want more than one user to be able to access my database server and execute queries on it at the same time... I am working under Linux so there is no issue of portability right now..
I know i need to use Sockets which is fine.. I also know that i need to use a concept like Thread Pool where i will be required to create a maximum number of threads initially and then for each client request wake up a thread and assign it to the client..
As for now what i am unable to figure out is how all this is actually going to be bundled together.. Where should i implement multithreading.. on client side / server side.? how is my parser going to be configured to take input from each of the clients separately?(mostly via files i think?)
If anyone has idea about how i can implement this pls do tell me bcos i am stuck here in this project...
Thanks.. :)
If you haven't already, take a look at Beej's Guide to Network Programming to get your hands dirty in some socket programming.
Next I would take his example of a stream client and server and just use that as a single threaded query system. Once you have got this down, you'll need to choose if you're going to actually use threads or use select(). My gut says your on disk database doesn't yet support parallel writes (maybe reads), so likely a single server thread servicing requests is your best bet for starters!
In the multiple client model, you could use a simple per-socket hashtable of client information and return any results immediately when you process their query. Once you get into threading with the networking and db queries, it can get pretty complicated. So work up from the single client, add polling for multiple clients, and then start reading up on and tackling threaded (probably with pthreads) client-server models.
Server side, as it is the only person who can understand the information. You need to design locks or come up with your own model to make sure that the modification/editing doesn't affect those getting served.
As an alternative to multithreading, you might consider event-based single threaded approach (e.g. using poll or epoll). An example of a very fast (non-SQL) database which uses exactly this approach is redis.
This design has two obvious disadvantages: you only ever use a single CPU core, and a lengthy query will block other clients for a noticeable time. However, if queries are reasonably fast, nobody will notice.
On the other hand, the single thread design has the advantage of automatically serializing requests. There are no ambiguities, no locking needs. No write can come in between a read (or another write), it just can't happen.
If you don't have something like a robust, working MVCC built into your database (or are at least working on it), knowing that you need not worry can be a huge advantage. Concurrent reads are not so much an issue, but concurrent reads and writes are.
Alternatively, you might consider doing the input/output and syntax checking in one thread, and running the actual queries in another (query passed via a queue). That, too, will remove the synchronisation woes, and it will at least offer some latency hiding and some multi-core.
I have a certain service where specific functions will take longer to call than others, sometimes they might take seconds to return. In order to prevent the client's UI being blocked when this happens what is the preferred solution:
Use a Duplex channel and simply use the callbacks to update the UI when data is received.
Use a separate thread to call the service, and simply use request-reply operations, and then update the ui thread when data is returned.
Which solution is better, particularly when interoperability is favored but not strictly necessary, and in your opinion, which one is faster (and cleaner) to implement and maintain?
If you implement callback contracts then you are removing the need for the client to implement multithreading code. This might not be a significant advantage when working with .Net clients (as VS will auto generate the asynch proxy code for you), though could prove beneficial when working with clients of other platforms/languages.
Which one is cleaner? Well, that depends whether you are a client or server developer. If, as I suspect in your case, you are both, and you can just use .Net for client and server, then I'd probably be tempted to avoid callbacks for now. If you'd have implied that the service calls where taking 45 seconds then I'd say call back contracts, it really is subjective, but if I were to stick my neck out then I'd say that if responses take longer than 5 seconds then it is time to move to callbacks.
You should implement a CallBackcontract.
Here is an example.