Asynchronously process exchange while original payload remain intact - apache-camel

We had a requirement to process a file that has more than 100k records. Each record should be generated as XML and saved to DB. We are able implement system which can handle more than 500k records and now we got a new requirement to transform each record into another form of XML and save in another table for audit purposes.
I followed the following approach to implement it. Initially each record is read from flat file, convert to domain contract and then parallelly save to audit table and transform/enrich to another format and save to domain table. Here is the sample route that i am using.
<route>
<from uri="direct-vm:domainInXML" />
<setHeader headerName="auditID"><groovy>UUID.randomUUID().toString()</groovy> </setHeader>
<!-- aysn transform and save domain XML to audit DB -->
<inOnly uri="vm:auditInXMLTransformAndDBPersistor"/>
<to uri="activemq:queue:domianInQueue?disableReplyTo=true"/>
</route>
<route>
<from uri="activemq:queue:domianInQueue" />
<!-- transform and enrich headers -->
<to uri="xslt:xslt/convertToInternalDomainContract.xsl />
<to uri="direct-vm:transformAndSaveTODomainDB"/>
</route>
<route>
<from uri="vm:auditInXMLTransformAndDBPersistor?concurrentConsumers=3" />
<!-- transform and enrich headers -->
<to uri="xslt:xslt/convertToAuditDomainContract.xsl />
<to uri="direct-vm:transformAndSaveTOAuditDB"/>
</route>
Question here is as we are are processing thousands of records, does transformation and persistence of audit XML runs parallelly in another thread while same domain XML is transformed into another format and saved to domain DB?
Will there be any delay? Is there any better approach that you can suggest? When we save audit XML to DB, initially we set status as 'CREATE', and the during transformation,validation and persistence of internal domain fails we need to update status as ERROR in audit table using auditID in header.
During processing when any record(s) fails to be processed due to some error, I tried to update status to ERROR using auditID in header but by that time there are chances of audit XML not made to auditDB. How can solve this? Any help is appreciated.

For use cases involving audit wiretap eip comes in handy. You can refer to http://camel.apache.org/wire-tap. per definition from Camel, Wire Tap (from the EIP patterns) allows you to route messages to a separate location while they are being forwarded to the ultimate destination.
Basically wiretap creates a different thread and performs the needed functionality with a copy of the original exchange. Inc case of failure you can use the same wiretap route to perform any update. To confirm it's sequentially processed you can have concurrent consumers set to 1, so that you don't end up in a race condition.

Related

How to enrich message in File Route from SFTP to JMS in clustered environment using Apache Camel?

We are going to read a file from SFTP and put employees in database.
Following is XML Structure
<employees>
<employee></employee>
<employee></employee>
<employee></employee>
<employees>
The strategy i think is:
Pick XML file from SFTP ->
Fetch Employee No. from XML -> (24000 Employees)
Fetch Data from "System 1 and System 2" on base of employee no. in xml file all together in memory ->
Split Employee from XML ->
Assign Data from "System 1 and System 2" to Each employee Exchange ->
Put Each employee XML to JMS Queue (Read in clustered envionment)
Is there any other best strategy to handle this scenario in Apache Camel in abve route or route 2 (Jms Queue)?
Note: As it is clustered environment; we can't fetch data from System1 & System2 in servers itself then we need to keep tracking by assigning batch id to one file contents. You can think this scenario in other integrations too
The above steps can be improved by
Speed up your process after step 4 by parallel processing (e.g. use threadpool)
Fasten SFTP process by release core thread after step 3 (e.g. use wireTap)
To prevent data lost, store exchange to another persistent storage (e.g. JMS queue)
The application's memory usage will increase
If step 3 (fetch data from "System 1 and System 2") can be done concurrently, launch multiple application instance with idempotent (leverage external cache or DB) might help for lots of files.

External Service too fast for database in laravel

I have an application which is connected to an external webservice. The webservice sends messages with an ID to the laravel application. Within the controller I check if the ID of the message already exists in the database. If not, I store the message with the ID, if it exists I skip the message.
Unfortunately sometimes the webservice sends a message with the same ID multiple times within the same second. Its an external service, so I have no control over it.
The problem now is, that the messages come so fast, that the database has not saved the message before the next message comes into the controller. As a result, the check if the ID already exists fails and it tries to save the same message once more. This leads to an exception, because I have a unique identifier on the ID column.
What is the best strategy to handle this? To use a queue for it, is not a good solution, because the messages are time critical and the queue is even slower and it would lead to a message jam/congestion within the queue.
Any idea or help is appreciated a lot! Thanks!
You can send to your database INSERT IGNORE requests
INSERT IGNORE INTO messages (...) VALUES (...)
or
INSERT INTO messages (...) VALUES (...) ON DUPLICATE KEY UPDATE id=id.
You can try updating on duplicate. That is a way I have used in the past to get around issues like this. Not sure if it's the perfect solution, but definitely an option. I assume you are using mysql.
https://dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html

Avoiding duplicates

Imagine that we have a file, and some job that processes it and sends the data:
into the database
to an external service
Can we guarantee to process the file only once or at least to determine that something went wrong and notify the user so that he manually solved this problem?
Yes, you can.
What you can do is create a table in the database to store the name and a flag/status (if read, yes else no) of files. When process feeds the file in that location, make sure that the same process updates the name (if name is different each time) and flag/status for that file in the database. Your file read process can get the name of file from the database and dump that file in wherever you ant and when it's done, It should update the flag to read or whatever. This way, you can avoid reading the file more than one time.
I would store two tables of information in your database.
The processed file lines like you were already doing.
A record of the files themselves. Include:
the filename
whether the processing was successful, failed, partially succeeded
a SHA1 hashed checksum that can be used to check for the uniqueness of the file later
When you go to process a file, you first check whether the checksum already exists. If it does, you can stop processing and log the issue. Or you can throw that information on the file table.
Also be sure to have a foreign key association between your processed lines and your files. That way if something does go wrong, the person doing manual intervention can trace the affected lines.
Neither Usmana or Tracy answer actually guarantees that a file is not processed more than once and your job doesn't send duplicate requests to the database and the external service(#1 and #2 in your question). Both solutions suggest keeping a log and update it after all the processing is done but if an error occurs when you try to update the log at the very end, your job will try processing the file again next time it runs and will send duplicate requests to the database and external service. The only way to deal with it using the solutions Usmana and Tracy suggested is to run everything in a transaction but it's quite a challenging task in a distributing environment like yours.
A common solution to your problem is to gracefully handle duplicate requests to the database and external services. The actual implementation can vary but for example you can add a unique constraint to the database and when the job tries to insert a duplicate record an exception will be thrown which you can just ignore in the job because it means the required data is already in the db.
My answer don't mean that you don't need the log table Usmana and Tracy suggested. You do need it to keep track of processing status but it doesn't really guarantee there won't be duplicate requests to your database and external service unless you use a distributed transaction.
Hope it helps!

How to process the data in the table, which is frequently inserted

I have a table dbo.RawMessage, which allows anther system to frequently(insert 2 records per second) insert data.
I need to process the data in the RawMessage, and put the processed data in dbo.ProcessedMessage.
Because the processing logic is not very complected, so my approach is add a insert trigger in the RawMessage table, but sometime I got deadlock.
I am using SQL SERVER EXPRESS
My questions:
1.Is this a stuipid approach?
2.If not, how to improve?
3.If yes, please guide me the graceful way

Sql Server Service Broker - thorough, in-use example of externally activated console app

I need some guidance from anyone who has deployed a real-world, in-production application that uses the Sql Server Service Broker external activation mechanism (via the Service Broker External Activator from the Feature Pack).
Current mindset:
My specs are rather simple (or at least I think so), so I'm thinking of the following basic flow:
order-like entity gets inserted into a Table_Orders with state "confirmed"
SP_BeginOrder gets executed and does the following:
begins a TRANSACTION
starts a DIALOG from Service_HandleOrderState to Service_PreprocessOrder
stores the conversation handle (from now on PreprocessingHandle) in a specific column of the Orders table
sends a MESSAGE of type Message_PreprocessOrder containing the order id using PreprocessingHandle
ends the TRANSACTION
Note that I'm not ending the conversation, I don't want "fire-and-forget"
event notification on Queue_PreprocessOrder activates an instance of PreprocessOrder.exe (max concurrent of 1) which does the following:
begins a SqlTransaction
receives top 1 MESSAGE from Queue_PreprocessOrder
if message type is Message_PreprocessOrder (format XML):
sets the order state to "preprocessing" in Table_Orders using the order id in the message body
loads n collections of data of which computes an n-ary Carthesian product (via Linq, AFAIK this is not possible in T-SQL) to determine the order items collection
inserts the order items rows into a Table_OrderItems
sends a MESSAGE of type Message_PreprocessingDone, containing the same order id, using PreprocessingHandle
ends the conversation pertaining to PreprocessingHandle
commits the SqlTransaction
exits with Environment.Exit(0)
internal activation on Queue_HandleOrderState executes a SP (max concurrent of 1) that:
begins a TRANSACTION
receives top 1 MESSAGE from Queue_InitiatePreprocessOrder
if message type is Message_PreprocessingDone:
sets the order state to "processing" in Table_Orders using the order id in the message body
starts a DIALOG from Service_HandleOrderState to Service_ProcessOrderItem
stores the conversation handle (from now on ProcessOrderItemsHandle) in a specific column of Table_Orders
creates a cursor for rows in Table_OrderItems for current order id and for each row:
sends a MESSAGE of type Message_ProcessOrderItem, containing the order item id, using ProcessOrderItemsHandle
if message type is Message_ProcessingDone:
sets the order state to "processed" in Table_Orders using the order id in the message body
if message type is http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog (END DIALOG):
ends the conversation pertaining to conversation handle of the message
ends the TRANSACTION
event notification on Queue_ProcessOrderItem activates an instance of ProcessOrderItem.exe (max concurrent of 1) which does the following:
begins a SqlTransaction
receives top 1 MESSAGE from Queue_ProcessOrderItem
if message type is Message_ProcessOrderItem (format XML):
sets the order item state to "processing" in Table_OrdersItems using the order item id in the message body, then:
loads a collection of order item parameters
makes a HttpRequest to a URL using the parameters
stores the HttpResponse as a PDF on filesystem
if any errors occurred in above substeps, sets the order item state to "error", otherwise "ok"
performs a lookup in the Table_OrdersItems to determine if all order items are processed (state is "ok" or "error")
if all order items are processed:
sends a MESSAGE of type Message_ProcessingDone, containing the order id, using ProcessOrderItemsHandle
ends the conversation pertaining to ProcessOrderItemsHandle
commits the SqlTransaction
exits with Environment.Exit(0)
Notes:
specs specify MSSQL compatibility 2005 through 2012, so:
no CONVERSATION GROUPS
no CONVERSATION PRIORITY
no POISON_MESSAGE_HANDLING ( STATUS = OFF )
I am striving to achieve overall flow integrity and continuity, not speed
given that tables and SPs reside in DB1 whilst Service Broker objects (messages, contracts, queues, services) reside in DB2, DB2 is SET TRUSTWORTHY
Questions:
Are there any major design flaws in the described architecture ?
Order completion state tracking doesn't seem right. Is there a better method ? Maybe using QUEUE RETENTION ?
My intuition tells me that in no case whatsoever should the activated external exe terminate with an exit code other than 0, so there should be try{..}catch(Exception e){..} finally{ Environment.Exit(0) } in Main. Is this assumption correct ?
How would you organize error handling in DB code ? Is an error log table enough?
How would you organize error handling in external exe C# code ? Same error logging
table ?
I've seen the SQL Server Service Broker Product Samples, but the Service Broker Interface seems overkill for my seemingly simpler case. Any alternatives for a simpler Service Broker object model ?
Any cross-version "portable" admin tool for Service Broker capable of at least draining poison messages ?
Have you any decent code samples for any of the above ?
Q: Are there any major design flaws in the described architecture ?
A: Couple of minor perks:
- waiting for an HTTP request to complete while holding open a transaction is bad. You can't achieve transactional consistency between a database and HTTP anyway, so don't risk to have a transaction stretch for minutes when the HTTP is slow. The typical pattern is to {begin tran/receive/begin conversation timer/commit} then issue the HTTP call w/o any DB xact. If the HTTP call succeeds then {begin xact/send response/end conversation/commit}. If the HTTP fails (or client crashes) then let the conversation time activate you again. You'll get a timer message (no body), you need to pick up the item id associated with the handle from your table(s).
Q: Order completion state tracking doesn't seem right. Is there a better method ? Maybe using QUEUE RETENTION ?
A: My one critique of your state tracking is the dependency on scanning the order items to determine that the current processed one is the last one (5.3.4). For example you could add the information that this is the 'last' item to be processed in the item state so you know, when processing it, that you need to report the completion. RETENTION is only useful in debugging or when you have logic that require to run 'logical rollback' and to compensating actions on conversation error.
Q: My intuition tells me that in no case whatsoever should the activated external exe terminate with an exit code other than 0, so there should be try{..}catch(Exception e){..} finally{ Environment.Exit(0) } in Main. Is this assumption correct ?
A: The most important thing is for the activated process to issue a RECEIVE statement on the queue. If it fails to do so the queue monitor may enter the notified state forever. Exit code is, if I remember correctly, irrelevant. As with any background process is important to catch and log exceptions, otherwise you'll never even know it has a problem when it start failing. In addition to disciplined try/catch blocks, Hookup Application.ThreadException for UI apps and AppDomain.UnhandledException for both UI and non-UI apps.
Q: How would you organize error handling in DB code ? Is an error log table enough?
A: I will follow up later on this. Error log table is sufficient imho.
Q: How would you organize error handling in external exe C# code ? Same error logging table ?
A: I created bugcollect.com exactly because I had to handle such problems with my own apps. The problem is more than logging, you also want some aggregation and analysis (at least detect duplicate reports) and suppress floods of errors from some deployment config mishap 'on the field'. Truth be told nowadays there are more options, eg. exceptron.com. And of course I think FogBugs also has logging capabilities.
Q: I've seen the SQL Server Service Broker Product Samples, but the Service Broker Interface seems overkill for my seemingly simpler case. Any alternatives for a simpler Service Broker object model ?
finally, an easy question: Yes, it is overkill. There is no simple model.
Q: Any cross-version "portable" admin tool for Service Broker capable of at least draining poison messages ?
A: The problem with poison messages is that the definition of poison message changes with your code: the poison message is whatever message breaks the current guards set in place to detect it.
Q: Have you any decent code samples for any of the above ?
A: No
One more point: try to avoid any reference from DB1 to DB2 (eg. 4.3.4 is activated in DB1 and reads the items table from DB2). This creates cross DB dependencies which break when a) one DB is offline (eg. for maintenance) or overloaded or b) you add database mirroring for HA/DR and one DB fails over. Try to make the code to work even if DB1 and DB2 are on different machines (and no linked servers). If necessary, add more info to the messages payload. And if you architect it that way that DB2 can be on a different machine and even multiple DB2 machines can exists to scale out the HTTP/PDF writing work.
And finally: this design will be very slow. I'm talking low tens messages per second slow, with so many dialogs/messages involved and everything with max_queue_readers 1. This may or may not be acceptable for you.

Resources