I understand that we execute a command in hystrix against a dependency. Lets say that dependency is exposed as a web service endpoint (REST). In my case that dependency is a server farm (a pool of servers established to serve high availability). Can I pass a pool of dependencies to Hystrix rather than one? What I assume from Hystrix than that it can start invoking randomly but with the passage of time evaluates and break the circuit (if required) if ALL of them are exhausted, based on certain factors i.e.; (time out, thread pool exhaustion, error etc).
Related
From the documentation on how GAE Flexible handles requests, it says that "An instance can handle multiple requests concurrently" but I don't know what this exactly means.
Let's say my application can process a single request every 60 seconds.
After starting to process the initial request, will another request (or 3) that occur say 30 seconds after (so halfway done with the first request), be handled by the same instance, or will it trigger autoscaling and spin up more instances to handle those new requests? This situation assumes that CPU utilization for the first request is still below the scaling CPU-utilization threshold.
I'm worried that because it takes my instance 60 seconds to process a single request and I will be receiving multiple requests at a time, that I'll be inefficiently triggering autoscaling even if there is enough processing power to handle additional requests on the same instance. Is this how it works? I would ideally like to be able to multi-thread my processing and accept additional requests on the same instance while still under the CPU utilization threshold.
The documentation for concurrent requests is scarce for the Flexible environment unlike the Standard environment so I want to be sure.
Perhaps 'number of workers' is the config setting you're looking for:
https://cloud.google.com/appengine/docs/flexible/python/runtime#recommended_gunicorn_configuration
Gunicorn uses workers to handle requests. By default, Gunicorn uses sync workers. This worker class is compatible with all web applications, but each worker can only handle one request at a time. By default, gunicorn only uses one of these workers. This can often cause your instances to be underutilized and increase latency in applications under high load.
And it sounds like you've already seen that you can specify the cpu utilization threshold:
https://cloud.google.com/appengine/docs/flexible/python/reference/app-yaml#automatic_scaling
You can also use something other than gunicorn if you prefer. Here's one of their example's where they use Honcho instead:
https://github.com/GoogleCloudPlatform/getting-started-python/blob/master/6-pubsub/app.yaml
https://ringpop.readthedocs.org/en/latest/
To my understanding, the sharding can be implemented in some library routines, and the application programs are just linked with the library. If the library is a RPC client, the sharding can be queried from the server side in real-time. So, even if there is a new partition, it is transparent to the applications.
Ringpop is application-layer sharding strategy, based on SWIM membership protocol. I wonder what is the major advantage at the application layer?
What is the other side, say the sharding in the system layer?
Thanks!
Maybe a bit late for this reply, but maybe someone still needs this information.
Ringpop has introduced the idea of 'sharding' inside application rather then data. It works more or less like an application level middleware, but with the advantage that it offers an easy way to build scalabale and fault-tolerance applications.
The things that Ringpop shards are the requests coming from clients to a specific service. This is one of its major advantages (there are mores, keep reading).
In a traditional SOA architecure, all requests for a specific serveice goes to a unique system that dispatch them among the workers for load balancing. These workers do not know each other, they are indipendent entities and cannot communicate between them. They do their job and sent back a reply.
Ringpop is the opposite: the workers know each other and can discover new ones, regularly talk among them to check their healthy status, and spread this information with the other workers.
How Ringpop shard the request?
It uses the concept of keyspaces. A keyspace is just a range of number, e.g. you are free to choice the range you like, but the obvious choice is hash the IDs of the objects in the application and use the hashing-function's codomain as range.
A keyspace can be imaginated as an hash "ring", but in practice is just a 4 or 8 byte integer.
A worker, e.g. a node that can serve a request for a specific service, is 'virtually' placed on this ring, e.g. it owns a contiguous portion of the ring. In practice, it has assigned a sub-range. A worker is in charge to handle all the requests belonging to its sub-range. Handle a request means two things:
- process the request and provide a response, or
- forward the request to another service that actually knows how to serve it
Every application is build with this behaviour embedded. There is the logic to handle a request or just forward it to another service that can handle it. The forwarding mechanism is nothing more than a remote call procedure, which is actually made using TChannel, the Uber's high performance forwarding for general RPC.
If you think on this, you can figure out that Ringpop is actually offering a very nice thing that traditionals SOA architecture do not have. The clients don't need to know or care about the correct instance that can serve their request. They can just send a request anywhere in Ringpop, and the receiver worker will serve it or forward to the rigth owner.
Ringpop has another interesting feature. New workers can dinamically enter the ring and old workers can leave the ring (e.g. because a crash or just a shutdown) without any service interrputions.
Ringpop implements a membership protocol based on SWIM.
It enable workers to discover each another and exclude a broken worker from the ring using a tcp-based gossip protocol. When a new worker is discovered by another worker, a new connection is established between them. Every worker map the status of the other workers sending a ping request at regular time intervals, and spread the status information with the other workers if a ping does not get a reply (e.g. piggyback membership update on a ping / gossip based)
These 3 elements consistent hashing, request forwarding and a membership protocol, make Ringpop an interesting solution to promote scalability and fault tolerance at application layer while keeping the complexity and operational overhead to a minimum.
I am writing an Apache 2.4 Module, and am using MPM worker.
Is there a hook I can use that gets called when a new thread is created, from the context of that thread? I need to do some per-thread initialization.
(More generally, is there a comprehensive list of hooks documented somewhere?)
The short answer in 'no', there is no such hook for thread initialization with the worker MPM. Apache designers recommend modules to be 'MPM agnostic as much as possible'. The key concept is that modules must fit the input filters - content generation - output filters architecture independently from the MPM that is actually managing the workload.
Of course there are cases where you need to know in which environment you're working tough
We are working on a similar problem. Threads are fired when requests come in, they run the hook defined in ap_hook_handler and, for what I understand, that's the time when your thread must gain access or allocate the resources it will need in order to serve the requests.
I've been told mod_rivet has an interesting solution creating its own thread pool and letting them exchange data with the Apache threads running the request handler.
Google says at Addressing Backends chapter that without targeting an instance by number, App Engine selects the first available instance of the backend. That makes me wondering – what is that “first available instance”? Is it the instance #1, or is it picked by some other methods?
The exact behavior of this depends on if your instances are dynamic or resident.
For dynamic instances, the request goes to the first instance that can handle the request immediately. If there are no instances that can handle the request immediately, the request is queued or a new instance is started, depending on queueing settings.
For resident instances, the request is sent to the least-loaded backend instance.
The reason for the different behaviors is to make the best use of your instances: resident instances are there anyway, so they're utilized equally, while dynamic instances are spawned only as needed, so the scheduler tries to avoid spinning up new ones if it can.
Question/Environment
The goal of my web application is to be a handy interface to the database at our company.
I'm using:
Scalatra (as minimal web framework)
Jetty (as servlet container)
SBT (Simple Build Tool)
JDBC (to interface with the database)
One of the requirements is that each user can manage a number of concurrent queries, and that even when he/she logs off, the queries keep running and can be retrieved later on (or their completion status checked if they stopped for any reason).
I suppose queries will likely have to run in their own seperate thread.
I'm not even sure if this issue is orthogonal or not to connection pooling (which I'm definitely going to use, BoneCP and C3PO seem nice).
Summary
In short: I need to have very fine-grained control over the lifetime of database requests, and they cannot be bound to the servlet lifetime
What ways are there to fulfill my requirements? I've searched quite a bit on google and stack overflow and haven't found anything that addresses my problem, is it even possible?
What is missing from your stack is a scheduler. e.g http://www.quartz-scheduler.org/
A rough explanation:
Your connection pool (eg C3P0) will be bound to the application's lifecycle.
Your servlets will be sending query requests to the scheduler (these will be associated to the user requesting the query).
The scheduler will be executing queries as soon as possible, by using connections from the connection pool. It may also do so in a synchronized/serialized order (for each user).
The user will be able to see all query requests associated with him, probably with status (pending, completed with results etc).