I have an application that uses LMDB. If multiple processes need to write to the database, only one will be allowed to run at a time and the rest block. Because of this, I want to rewrite the application to use a client-server model.
If the application is written to use a client-server model, the server can manage the writes and the other processes won't block. However, if a client encounters an error and has to roll back its transaction, how can it roll back its data without rolling back what the other clients have written?
I've looked at nested transactions, but write transactions may only have one nested transaction. So while a client can write its data to a nested transaction and roll it back if an error occurs, only one client will be able to run at a time. So while that solves the rollback problem, we're back to the problem that only one client can write at a time.
I've also taken a look at the MDB_NOLOCK option, which causes LMDB to not stop you from creating multiple write transactions. When you try to commit any transaction but the first one, it will return an error. Maybe the clients can pool their writes into their own transactions and when they're ready to commit, the server will dump the entries into the first write transaction, but that is hacky and I'm certain that is NOT what the developers intended it to be used for.
The only other solution I can think of is to keep clients in a separate database, which undoes the entire purpose of switching to a client-server model.
Are there any other ways to allow different processes to write to the database, while being able to roll back one client's data without rolling back everything?
There is no easy solution to the challenge you pose.
One can nest write transactions arbitrarily deep, but as you say that doesn't help you achieve better throughput because you'll still be limited to one thread for that write transaction. So for most intents and purposes, if you use LMDB as it was designed to be used, you're only going to have one thread doing write operations on the database.
You asked how you can abort someone's transaction without rolling back the txn of others. This is largely not an issue because, as mentioned above, you will have only one write transaction active at at time. If you commit at the end of the request and start a new transaction at the beginning of the next request, your write transactions will not overlap. Once you commit the first transaction, you'll not be able to abort it, but if your client informs the server to abort the request before it's committed, your server can abort it. And when it aborts, the database state will resort to the state it was in when that transaction was started. It will not lose any other transactions and not lose the changes created by other requests.
As you point out, you could batch some of the write requests in to a single write txn. You could process some of those write requests in other threads, but all LMDB API calls must still be done in the original thread. And if you try to dedicate a thread to each request to achieve some parallelism, you'll still need to ensure that the requests are mutually compatible and don't interfere with each other. And if one of those requests runs in to trouble, you'll have to abort the transaction and probably restart the transaction. When you restart the transactions, you'll probably only include the requests that did not run in to trouble. -- This is all possible, but only you have enough knowledge about your application to know if this would improve performance much and be worth your effort.
Related
I have an application where I need to store some data in a database (mysql for instance) and then publish some data in a message queue. My problem is: If the application crashes after the storage in the database, my data will never be written in the message queue and then be lost (thus eventual consistency of my system will not be guaranted).
How can I solve this problem ?
I have an application where I need to store some data in a database (mysql for instance) and then publish some data in a message queue. My problem is: If the application crashes after the storage in the database, my data will never be written in the message queue and then be lost (thus eventual consistency of my system will not be guaranted). How can I solve this problem ?
In this particular case, the answer is to load the queue data from the database.
That is, you write the messages that need to be queued to the database, in the same transaction that you use to write the data. Then, asynchronously, you read that data from the database, and write it to the queue.
See Reliable Messaging without Distributed Transactions, by Udi Dahan.
If the application crashes, recovery is simple -- during restart, you query the database for all unacknowledged messages, and send them again.
Note that this design really expects the consumers of the messages to be designed for at least once delivery.
I am assuming that you have a loss-less message queue, where once you get a confirmation for writing data, the queue is guaranteed to have the record.
Basically, you need a loop with a transaction that can roll back or a status in the database. The pseudo code for a transaction is:
Begin transaction
Insert into database
Write to message queue
When message queue confirms, commit transaction
Personally, I would probably do this with a status:
Insert into database with a status of "pending" (or something like that)
Write to message queue
When message confirms, change status to "committed" (or something like that)
In the case of recovery from failure, you may need to check the message queue to see if any "pending" records were actually written to the queue.
I'm afraid that answers (VoiceOfUnreason, Udi Dahan) just sweep the problem under the carpet. The problem under carpet is: How the movement of data from database to queue should be designed so that the message will be posted just once (without XA). If you solve this, then you can easily extend that concept by any additional business logic.
CAP theorem tells you the limits clearly.
XA transactions is not 100% bullet proof solution, but seems to me best of all others that I have seen.
Adding to what #Gordon Linoff said, assuming durable messaging (something like MSMQ?) the method/handler is going to be transactional, so if it's all successful, the message will be written to the queue and the data to your view model, if it fails, all will fail...
To mitigate the ID issue you will need to use GUIDs instead of DB generated keys (if you are using messaging you will need to remove your referential integrity anyway and introduce GUIDS as keys).
One more suggestion, don't update the database, but inset only/upsert (the pending row and then the completed row) and have the reader do the projection of the data based on the latest row (for example)
Writing message as part of transaction is a good idea but it has multiple drawbacks like
If your
a. database/language does not support transaction
b. transaction are time taking operation
c. you can not afford to wait for queue response while responding to your service call.
d. If your database is already under stress, writing message will exacerbate the impact of higher workload.
the best practice is to use Database Streams. Most of the modern databases support streams(Dynamodb, mongodb, orcale etc.). You have consumer of database stream running which reads from database stream and write to queue or invalidate cache, add to search indexer etc. Once all of them are successful you mark the stream item as processed.
Pros of this approach
it will work in the case of multi-region deployment where there is a regional failure. (you should read from regional stream and hydrate all the regional data stores.)
No Overhead of writing more records or performance bottle necks of queues.
You can use this pattern for other data sources as well like caching, queuing, searching.
Cons
You may need to call multiple services to construct appropriate message.
One database stream might not be sufficient to construct appropriate message.
ensure the reliability of your streams, like redis stream is not reliable
NOTE this approach also does not guarantee exactly once semantics. The consumer logic should be idempotent and should be able to handle duplicate message
Scenario
The DB for an application has gone down. This results in any actor responsible for committing important data to the DB failing to get a connection
Preferred Behaviour
The important data is written to the db when it comes back up sometime in the future.
Current Implementation
The actor catches the DBException, wraps the data in a DBWriteFailed case class, and sends the message to its supervisor. The supervisor then schedules another write for sometime in the future (e.g. 1 minute) using system.scheduler.scheduleOnce(...) so that we don't spin in circles too much while waiting for the DB to come back up.
This implementation certainly works but I feel there might be a better way.
The protocol gets a bit messier when the committing actor has to respond to the original sender after a successful commit.
The regular flow of messages to the committing actor is not throttled in any way and the actor will happily process the new messages, likely failing to connect to the DB for each and every one of them.
If messages get caught in this retry loop for too long, the mailboxes of the committing actors will start to balloon. It is important that this data be committed, but none of it matters if the application crawls to a halt or crashes due to excessive memory usage.
I am an akka novice and I am largely inexperienced when it comes to supervisor strategies, but I feel as though I may be able to leverage one of those to handle some of this retry logic.
Is there a common approach in akka for solving a problem like this? Am I on the right track or should I be heading in a different direction?
Any help is appreciated.
You can use Akka Circuit Breaker to reduce connection attempts. Instead of using the scheduler as retry queue I would use a buffer (with max size limit) inside the actor and retry those when circuit breaker becomes closed again (onClose callback should send message to self actor). An alternative could be to combine the circuit breaker with a stashing mailbox.
If you're planning to implement full failover in your app
Don't.
Do not bubble database failover responsibility up into the app layer. As far as your app is concerned, the database should just be up and ready to accept reads and writes.
If your database goes down often, spend time making your database more robust (there's a multitude of resources on the web already for this: search the web for terms like 'replication', 'high availability', 'load-balancing' and 'clustering', and learn from the war stories of others at highscalability.com). It all really depends on what the cause of your DB outages are (e.g. I once maxed out the NIC on the DB master, and "fixed" the problem intermittently by enabling GZIP on the wire).
You'll be glad you adhered to a separation of concerns if you go down this route.
If you're planning to implement the odd sprinkling of retry logic and handling DB brown-outs
If you're not expecting your app to become a replacement database, then Patrik's answer is the best way to go.
I am trying to write a program which will be run in batch on AS400. This program is going to write a record into a file to reflect its processing status, say, when it is just submitted it adds a record saying it is currently running, and when it is done it updates the same record saying it has finished. If I want to submit this program into batch for multiple times, what is the best way to cope with this type of simultaneous file access to increase the efficiency? I don't want a job to lock the whole file and stops others from updating it in the same time. It can lock the record it needs and leave the rest to others. How to achieve this? RPGLE or QMQRY? Or any other methods?
RPG will not lock the entire file, only the record.
Personally, I'd recommend SQL for (pretty much) all file access, even through RPG. IBM hasn't been updating their Native I/O for a while, just concentrating on the SQL side of things.
Because, during normal use, record locks in RPG are released once the write or update has been performed, you should probably just have your SQL run WITH NC (no commit). You need a way to tie a processing job with the data it's processing anyways (assuming stuff is long-running enough that things are in files outside of QTEMP) - you want to be able to pick up where you left off, if your job dies (so, you can't rely on holding the lock as a control mechanism). So don't forget that you're going to need some sort of monitor job (that can at least report the status, if not resubmit things - look at the QUSRJOBI API).
If you're doing this because you're using all Native I/O, and processing huge sets of data (not huge, processor intensive calculations), consider re-writing everything to SQL. Seriously. You can get way better performance - we've taken a process that used to run for 25+ hours, to something that runs in 2.5ish.
There's an app that starts a transaction on SQL Server 2008 and moves some data around. Then, while the transaction is still not committed, the app prints out some labels. It is very important that the transaction is not committed until printing succeeded; if a printing error occurs, everything is rolled back.
Now, the printing engine is a) grew quite huge and complex, and b) is eventually required from lots of places. It is therefore decided to separate the engine and make it a service.
Yes, it is possible to pass all data required for printing from the client app to that server so that the server only prints and is not concerned about databases. However, that would mean leaving piles of code and label templates in each application that requires printing; effectively, very little separation will then occur. On contrary, it would be extreemely efficient (and easier for me to write and maintain) to just pass the IDs of what is required to the service which then would go to the database and get the data. All formats and layouts will be centralized and apps will only ask for 5 delivery notes from print job 12345.
Now, this is not going to happen as the transaction is still not committed at the moment of printing. The service would not be able to read the data, and using READ UNCOMMITTED is not quite an option.
I was going to use the good old sp_bindsession to join the two sessions, app's and service's, but then it is suddenly deprecated and to be removed from future releases. The help suggests I use MARS or distributed transactions instead, but I can't see how they would help.
Any advice?
My gut feeling is that attempting to share a transaction between two processes in this way is not a good idea.
My approach would be to either to pass all data to service, or investigate alternatives to keeping the transaction open for the duration of the printing - would a simpler mechanism (such as an IsPrinted flag for each record) not suffice?
Failing that, the eaisest way I can see of doing this would be to have the printing service pass all of its SQL requests back through to the originating process so that they can be executed in the context of the original transaction.
Only sp_getbindtoken/sp_bindsession can do what you ask, and it is deprecated and will be removed.
In theory you should use short transactions, represent the 'printing' state as a committed state, and have compensating actions if the print fails. Also if the printing engine is exposed as a service, it should be autonomous and receive as a message all data it needs to print (like label templates). I understand this is easy for me to to say but may be a major undertaking on the product.
For the moment I think your best bet is to use the session binding tokens. Altough I have to call out that leaving transactions open for the duration of physical operations (printing) is a very bad practice.
I am implementing an undo button for a lengthy operation on a web app. Since the undo will come in another request, I have to commit the operation.
Is there a way to issue a hint on the transaction like "maybe rollback"? So after the transaction is committed, I could still rollback in another processes if needed.
Otherwise the Undo function will be as complex as the operation it is undoing.
Is this possible? Other ideas welcome!
Another option you might want to consider:
If it's possible for the user to 'undo' the operation, you may want to implement an 'intent' table where you can store pending operations. Once you go through your application flow, the user would need to Accept or Undo the operation, at which point you can just run the pending transaction and apply it to your database.
We have a similar system in place on our web application, where a user can submit a transaction for processing and has until 5pm on the day it's scheduled to run to cancel it. We store this in an intent table and process any transactions scheduled for that day after the daily cutoff time. In your case you would need an explicit 'Accept' or 'Undo' operation from the user after the initial 'lengthy operation', so that would change your process a little bit.
Hope this helps.
The idea in this case is to log for each operation - a counter operation that do the opposite, in a special log and when you need to rollback you actually run the commands that you've logged.
some databases have flashback technology that you can ask the DB to go back to certain date and time. but you need to understand how it works, and make sure it will only effect the data you want and not other staff...
Oracle Flashback
I don't think there is a similar technology on SQL server and there is a SO answer that says it doesn't, but SQL keeps evolving...
I'm not sure what technologies you're using here, but there are certainly better ways of going about this. You could start by storing the data in the session and committing when they are on the final page. Many frameworks these days also allow for long running transactions that span multiple requests.
However, you will probably want to commit at the end of every page and simply set some sort of flag for when the process is completed. That way if something goes wrong in the middle of it the user can recover and all is not lost.