Apache 2.4 MPM worker: Thread Init? - c

I am writing an Apache 2.4 Module, and am using MPM worker.
Is there a hook I can use that gets called when a new thread is created, from the context of that thread? I need to do some per-thread initialization.
(More generally, is there a comprehensive list of hooks documented somewhere?)

The short answer in 'no', there is no such hook for thread initialization with the worker MPM. Apache designers recommend modules to be 'MPM agnostic as much as possible'. The key concept is that modules must fit the input filters - content generation - output filters architecture independently from the MPM that is actually managing the workload.
Of course there are cases where you need to know in which environment you're working tough
We are working on a similar problem. Threads are fired when requests come in, they run the hook defined in ap_hook_handler and, for what I understand, that's the time when your thread must gain access or allocate the resources it will need in order to serve the requests.
I've been told mod_rivet has an interesting solution creating its own thread pool and letting them exchange data with the Apache threads running the request handler.

Related

Multiple Threads Trying to write on the same No-Sql Datastore at the same time

We have a internal no-sql datastore for our company. I provides Key-Value storage. The value is of JSON format. I will refer it as XYZ DataStore.
Problem Statement:
I have an application which is spawning 10-15 threads at a time. Each thread is responsible for writing to the same XYZ concurrently. Though the records being PUT, are different, (meaning different Key). The XYZ Rest Client created is singleton, meaning all the threads are using one singleton client. (I am using spring beans to create singleton client).
Observation: Only one thread is able to put the records in XYZ. The other threads are not able to write to sable at all, not even after a delay time.
How can I handle this concurrent writing to XYZ? What is the preferred way?
I can achieve it by following:
Implement Lock on the PUT API on my end and even if concurrent
threads are attempting to write, with a single thread, it should be
able to wait until the lock is released.
I am not sure how to implement this. If anyone has pointers, it will be great.
The above are like the producer threads. I can create one consumer thread. The producer thread would write this record to be put in a
Queue and the consumer thread would be reading it one by one and
updating it.
Here, I will be using java.util.concorrent. BlockingQueue to read and write in a queue, being used by consumer and producer threads. Is this the correct way?
Can anyone suggest me which is the best way to do it?
BTW, the application is built in Java, using spring framework.
TIA
Even singleton as well ensure you are not sharing anything across threads and each method in that class is separate only then you can very well set that as singleton and dont worry about the blocking since its going to be a method call so each thread will simply call that method. thats it. Here you know each transaction is failed or not based on the method response so based on that you decide whether the transaction is success or re-initiate it for later. No delay required.
In Queuing system you can probably consider DeQueue its much better on in concurrent package. see some examples and implement it. This is as well effective but here you are sharing the queue across multiple threads. You wont feel that slowness but sharing is happening here. Here is there is no such transaction is passed for the pushed values or not. One thing we can do it after successful only we can dequeue the value otherwise just do the look alone. if keep on fails move to separate queue for later use. this is one more pardon for us. No delay required.

What are the Camel constructs for "long-running routes"?

We are investigating Camel for use in a new system; one of our situations is that a set of steps is started, and some of the steps in the set can take hours or days to execute. We need to execute other steps after such long-elapsed-time steps are finished.
We also need to have the system survive reboot while some of the "routes" with these steps are in progress, i.e., the system might be rebooted, and we need the "routes" in progress to remain in their state and pick up where they left off.
I know we can use a queued messaging system such as JMS, and that messages put into such a queue can be handled to be persisted. What isn't clear to me is how (or whether) that fits into Camel -- would we need to treat the step(s) after each queue as its own route, so that it could, on startup, read from the queue? That's going to split up our processing steps into many more 'routes' than we would have otherwise, but maybe that's the way it's done.
Is/are there Camel construct/constructs which assist with this kind of system? If I know their terms and basic outline, I can probably figure it out, but what I really need is an explanation of what the constructs do.
Camel is not a human workflow / long-lasting tasks system. For that kind look at BPMS systems. Camel is more fitting for real time / near real time integrations.
For long tasks you persist their state in some external system like a message broker or database or BPMS, and then you can use Camel routes to process and move from one state to the next - or where Camel fit in such as integrating with the many different systems you can do OOTB with the 200+ Camel components.
Camel do provide graceful shutdown so you can safely shutdown or reboot Camel. But in the unlikely event of a crash, you may want to look at transactions and idempotency if you are talking about surviving a system crash.
You are referring to asynchronous processing of messages in routes. Camel has a couple of components that you can use to achieve this.
SEDA: In memory not persistent and can only call end points in the same route.
VM: In memory not persistent and can call endpoints in different camel contexts but limited to the same JVM. This component is a extension of SEDA.
JMS: Persistence is configurable on the queue stack. Much more heavy weight but also more fault tolerant than SEDA/JVM.
SEDA/JVM can be used as low overhead replacements for JMS components and in some cases I would use them exclusively. In your case the persistence element is a required so SEDA/JVM is not an option, but to keep things simple the examples will use SEDA as you can get some basics up and running quickly.
The example will assume the following we have a timer that kicks off and then there is two processes it needs to run. See screenshot below:
In this route the message flow is synchronous between the timer and the two process beans.
If we wanted to make these steps asynchronous we would need to break each step into a route of its own. We would then connect these routes using one of the components listed in the beginning. See the screenshot below:
Notice we have three routes each route only has one "processing" step in it. Route Test only has a timer which fires a messages to the SEDA queue called processOne. This message is received on the SEDA queue and sent to the Process_One bean. After this it is the send to the SEDA queue called processTwo, where it is received and passed to the Process_Two bean. All this is done asynchronously.
You can replace the SEDA components with JMS once you get to understand the concepts. I suspect that state tracking is going to be the most complicated part as Camel makes the asynchronous part easy.

is camel-cxf consumer multithreaded and how to check if an component uses multiple threads?

We are using camel-cxf as consumer (soap) in our project and asked ourself if camel-cxf uses multiple threads to react on requests.
We think it uses multiple threads, right?!
But what does this mean for the rest of the route? Is all multithreaded after "from" or is there a point of synchronization?
And what does this mean for "parallelProcessing" or "threads"?
In our case we use jdbc component later in the route. IS camel-jdbc also using multiple threads?
How to know in general what threading model is used by a given component?
Let's start with your last question:
How to know in general what threading model is used by a given
component?
You are probably asking which component is single-threaded by default and which ones are multi-threaded?
You need to ask yourself which approach makes most sense for a component and read the component's documentation. Normally the flags will tell you what behavior is applied by default. CXF is a component that requires a web server, jetty in this case, for a SOAP (over HTTP) client to be able to call the service. HTTP is a stateless protocol, a web server has to scale to many clients, thus it makes a lot of sense for a web server to be multi-threaded. So yes, two simultanious requests to a CXF endpoint are handled by two separate (jetty) threads. The route starting at the CXF endpoint is executed simultaniously by the jetty threads that received the request.
On the contrary, if you are polling for file system changes, e.g. you want to check if a certain file was created, it makes no sense to apply multiple threads to the task of polling. Thus the file consumer is single threaded. The thread employed by the file consumer to do the polling will also execute your route that processes the file(s) that were picked up during a poll.
If processing the files identified by a poll takes a long time compared to your polling intervall, and you cannot afford to miss a poll, then you need to hand of the processing of the rest of the route to another thread so your polling thread is again free to do, well, polling. Enter the Threads DSL.
Then you have processors like the splitter that create many tasks from a single task. To make the splitter work for everyone it must be assumed that the tasks created by the splitter cannot be performed out of order and/or fully independent of each other. So the safe default is to run the steps wrapped by the split step in the thread that executes the route as a whole. But if you the route author knows that the individual split items can be processed independent of each other, then you can parallelize the processing of the steps wrapped by the split step by setting parallelProcessing="true".
Both the threads DSL and the using parallelProcessing="true" acquire threads from a thread pool. Camel creates a pool for you. But if you want to use multiple pools or a pool with a different configuration, then you can always supply your own.

Purpose and difference b/w Apache camel Kafka consumer URI option consumerStreams vs consumersCount

Apache Camel Kafka Consumer provides URI options called "consumerStreams" and "consumersCount".
Need to understand the difference and usage scenarios and how it will fit with multi-partition Kafka topic message consumption
consumerCount controls how many consumer instances are created by the camel endpoint. So if you have 3 partitions in your topic and you have a consumerCount of 3 then you can consume 3 messages (1 per partition) at a time. This setting does exactly what you would expect from the documentation
consumerStreams is a totally different setting and has imho a misleading name AND a misleading documentation.
Currently the documentation (https://github.com/apache/camel/blob/master/components/camel-kafka/src/main/docs/kafka-component.adoc) says:
consumerStreams: Number of concurrent consumers on the consumer
But the source code reveals its real purpose:
consumerStreams configures how many Threads are available for all consumers to be run on. Internally the Kafka endpoint creates one Runnable per consumer. (consumerCount = 3) means 3 Runnables. These runnables are executed by an ThreadPoolExecutorService which is scaled by the consumerStreams setting.
Since the single consumer threads are long running tasks the only purpose of consumerStreams can be to handle reconnection or blocked threads. A higher value for consumerStreams does not lead to more parallelization. And it should better be named consumerThreadPoolSize or something like that.
I have checked Camel Kafka source code, it seems there is a different use of these parameters overtime.
consumerStreams were used in old versions of the Camel-Kafka component such as 2.13 as you can see here
consumersCount is used in latest versions of the Camel-Kafka component (see here) and it represents the number of org.apache.kafka.clients.consumer.KafkaConsumer that will be instantiated, so you should really use this for multi-partition consumption
it seems they were used together in camel 2.16

Parallel calls to google.appengine.api.channel.send_message

I am using send_message(client_id, message) in google.appengine.api.channel to fan out messages. The most common use case is two users. A typical trace looks like the following:
The two calls to send_message are independent. Can I perform them in parallel to save latency?
Well there's no async api available, so you might need to implement a custom solution.
Have you already tried with native threading? It could work in theory, but because of the GIL, the xmpp api must block by I/O, which I'm not sure it does.
A custom implementation will invariably come with some overhead, so it might not be the best idea for your simple case, unless it breaks the experience for the >2 user cases.
There is, however, another problem that might make it worth your while: what happens if the instance crashes and only got to send the first message? The api isn't transactional, so you should have some kind of safeguard. Maybe a simple recovery mode will suffice, given how infrequently this will happen, but I'm willing to bet a transactional message channel sounds more appealing, right?
Two ways you could go about it, off the top of my head:
Push a task for every message, they're transactional and guaranteed to run, and will execute in parallel with a fairly identical run time. It'll increase the time it takes for the first message to go out but will keep it consistent between all of them.
Use a service built for this exact use case, like firebase (though it might even be too powerful lol), in my experience the channel api is not very consistent and the performance is underwhelming for gaming, so this might make your system even better.
Fixed that for you
I just posted a patch on googleappengine issue 9157, adding:
channel.send_message_async for asynchronously sending a message to a recipient.
channel.send_message_multi_async for asynchronously broadcasting a single message to multiple recipients.
Some helper methods to make those possible.
Until the patch is pushed into the SDK, you'll need to include the channel_async.py file (that's attached on that thread).
Usage
import channel_async as channel
# this is synchronous D:
channel.send_message(<client-id>, <message>)
# this is asynchronous :D
channel.send_message_async(<client-id>, <message>)
# this is good for broadcasting a single message to multiple recipients
channel.send_message_multi_async([<client-id>, <client-id>], <message>)
# or
channel.send_message_multi_async(<list-of-client-ids>, <message>)
Benefits
Speed comparison on production:
Synchronous model: 2 - 80 ms per recipient (and blocking -.-)
Asynchronous model: 0.15 - 0.25 ms per recipient

Resources