We are considering using Camel framework inside one of our integration systems.
Speed processing is one of our main requirements and part of our architecture we are considering a custom caching mechanism. At a high level if some data exists in the cache there is no need to bring it from our storage (say database).
However data consistency is another important requirement. We don't want to have data in the cache that does not reflect the data in the storage. In other words we need our cache being written (updated) only if the storage commit was successful.
So my question is what would be the best way to execute an action (such us updating the cache) only if a transnational route completes. In a normal spring only application I would use either a TransactionalEventListener with a TransactionPhase.AFTER_COMMIT phase or just a plain non declarative transaction (begin and end of transaction managed inside the code).
Camel frameworks comes instead with so many good to have features that it would be a pity not to use it because this is not possible. I am sure our use case it is not that unusual and hope to hear some advice about how to achieve this.
Thank you in advance,
Julian
Check out onCompletion.
By default the onCompletion will be triggered when the Exchange is complete and regardless if the Exchange completed with success or with an failure (such as an Exception was thrown). You can limit the trigger to only occur onCompleteOnly or by onFailureOnly
Related
If I have multiple identical containers deployed simultaneously, and each contains a job to periodically create an artifact and save to a database, and what they save is deterministic, how should I go about preventing redundant operations?
Should I check the key in the database to see if it exists first, and if it doesn't, begin the saving operation? The artifact creation process is lengthy, so it's quite likely that one container may check the DB, see that it hasn't been saved to yet, and start the artifact creation process ... in the meantime, the other container may do the same.
I realize that having multiple clones of the same container is good for preventing downtime / keeping the application robust, but how should you deal with side effects?
This is a pretty open-ended question, so there isn't going to be one definitive answer without knowing the exact specifics of your situation.
Generally speaking in situations like this you should try to make the action that is being performed idempotent if possible, thus removing the issues if multiple requests are sent to perform the same action.
The question I would be asking myself is whether or not your architecture and technology stack is sutiable for this task. Not every activity needs to be performed in Kubernetes.
Would a Kubernetes CronJob be more sutiable for this?
What about a using messaging queue?
I am currently wondering how to handle application errors in Apache Flink streaming applications. In general, I see two cases:
Transient errors, where you want the input data to be replayed and processing might succeed on second try. An example would be a dependency on an external service, which is temporarily unavailable.
Permanent errors, where repeated processing will still fail; for example invalid input data.
For the first case, it looks like the common solution is to just throw some exception. Or is there a better way, e.g. a special kind of exception for more efficient handling such as FailedException from Apache Storm Trident (see Error handling in Storm Trident topologies).
For permanent errors, I couldn't find any information online. A map() operation, for example, always has to return something so one cannot just silently drop messages as you would in Trident.
What are the available APIs or best practices? Thanks for your help.
Since this question was asked, there has been some development:
This discussion holds the background of why side outputs should help, key extract:
Side outputs(a.k.a Multi-outputs) is one of highly requested features
in high fidelity stream processing use cases. With this feature, Flink
can
Side output corrupted input data and avoid job fall into “fail -> restart -> fail” cycle
Side output sparsely received late arriving events while issuing aggressive watermarks in window computation.
This resulted in jira: FLINK-4460 which has been resolved in Flink 1.1.3 and above.
I hope this helps, if an even more generic solution would be desireable, please think a bit on your usecase and consider to create a jira for it.
Internet says using database for queues is an anti-pattern, and you should use (RabbitMQ or Beanstalked or etc)
But I want all requests stored. So I can later lookup how long they took, any failed attempts or errors or notes logged, who requested it and with what metadata, what was the end result, etc.
It looks like all the queue libraries don't have this option. You can't persist the data to allow you to query it later.
I want what those queues do, but with a "persist to database" option. Does this not exist? How do people deal with this? Do you use a queue library and copy over all request information into your database when the request finishes?
(the language/database I'm using is anything, whatever works best for this)
If you want to log requests, and meta-data about how long they took etc, then do so - log it to the database when you know the relevant results, and run your analytic queries as you would expect to.
The reason to not be using the database as a temporary store is that under high traffic, the searching for, and locking of unprocessed jobs, and then updating or deleting them when they are complete, can take a great deal of effort. That is especially true if don't remove jobs from the active table, and so have to search ever more completed jobs to find those that have yet to be done.
One can implement the task queue by themselves using a persistent backend (like database) to persist the tasks in queues. But the problem is, it may not scale well and also, it is always better to use a proven implementation instead of reinventing the wheel. These are tougher problems to solve and it is better to use the existent frameworks.
For instance, if you are implementing in Python, the typical choice is to use Celary with Redis/RabbitMQ backend.
I'm currently developing a Camel Integration app in which resumption from a previous state of processing is important. When there's a power outage, for instance, it's important that all previously processed messages are not re-processed. The processing should resume from where it left off before the outage.
I've gone through a number of possible solutions including Terracotta and Apache Shiro. I'm not sure how to use either as documentation on the integration with Apache Camel is scarce. I've not settled on the two, however.
I'm looking for suggestions on the potential alternatives I can use or a pointer to some tutorial to get me started.
The difficulty in surviving outages lies primarily in state, and what to do with in-flight messages.
Usually, when you're talking state within routes the solution is to flush it to disk, or other nodes in the cluster. Taking the aggregator pattern as an example, aggregated state is persisted in an aggregation repository. The default implementation is in memory, so if the power goes out, all the state is lost. However, there are other implementations, including one for JDBC, and another using Hazelcast (a lightweight in-memory data grid). I haven't used Hazelcast myself, but JDBC does a synchronous write to disk. The aggregator pattern allows you to resume from where you left off. A similar solution exists for idempotent consumption.
The second question, around in-flight messages is a little more complicated, and largely depends on where you are consuming from. If you're in the middle of handling a web service request, and the power goes out, does it matter if you have lost the message? The user can simply retry. Any effects on external systems can be wrapped in a transaction, or an idempotent consumer with JDBC idempotent repository.
If you are building out integrations based on messaging, you should consume within a transaction, so that if your server goes down, the messages go back into the broker and can be replayed to another consumer.
Be careful when using seda: or threads blocks, these use an in-memory queue to pass exchanges between threads, any messages flowing down these sorts of routes will be lost if someone trips over the power cable. If you can't afford message loss, and need this sort of processing model, consider using a JMS queue as the endpoints between the two routes (with transactions to ensure you pick up where you left off).
I was looking to implement CQRS pattern. For the process of updating the read database, is it best to use a windows service, or to update the view at the time of creating a new record in the update database? Is it best to use triggers, or some other process? I've seen a couple of approaches and haven't made up my mind what is the best approach to achieve this.
Thanks.
Personally I love to use messaging to solve these kind of problems.
You commands result in events when they are processed and if you use messaging to publish the events one or more downstream read services can subscribe to the events and process them to update the read models.
The reason why messaging is nice in this case is that it allows you to decouple the write and read side from each other. Also, it allows you to easily have several subscribers if you find a need for it. Additionally, messaging using a persistent queuing system like MSMQ enables retrying of failed messages. It also means that you can take a read model offline (for updates etc) and when it comes back up it can then process all the events in the queue.
I'm no friend of Triggers in relational databases, but I imagine the must be pretty hard to test. And triggers would introduce routing logic where it doesn't belong. Could it be also that if the trigger action fails, the entire write transaction rolls back? Triggers is probably the least beneficial solution.
It depends on how tolerant your application must be with regards to eventual consistency.
If your app has no problem with read data being 5 minutes old, there's no need to denormalize upon every write data change. In that case, a background service that kicks in every n minutes or that kicks in only when the CPU consumption is below a certain threshold, for instance, can be a good solution.
If, on the other hand, your app is time-sensitive, such as in the case of frequently changing statuses, machine monitoring, stock exchange data etc., then you will want to keep the lag as low as possible and denormalize on the spot -- that is, in-process or at least in real-time. So in this case you may choose to run the denormalizers in a constantly-running process or to add them to the chain of event handlers straight in your code.
Your call.