I am implementing an undo button for a lengthy operation on a web app. Since the undo will come in another request, I have to commit the operation.
Is there a way to issue a hint on the transaction like "maybe rollback"? So after the transaction is committed, I could still rollback in another processes if needed.
Otherwise the Undo function will be as complex as the operation it is undoing.
Is this possible? Other ideas welcome!
Another option you might want to consider:
If it's possible for the user to 'undo' the operation, you may want to implement an 'intent' table where you can store pending operations. Once you go through your application flow, the user would need to Accept or Undo the operation, at which point you can just run the pending transaction and apply it to your database.
We have a similar system in place on our web application, where a user can submit a transaction for processing and has until 5pm on the day it's scheduled to run to cancel it. We store this in an intent table and process any transactions scheduled for that day after the daily cutoff time. In your case you would need an explicit 'Accept' or 'Undo' operation from the user after the initial 'lengthy operation', so that would change your process a little bit.
Hope this helps.
The idea in this case is to log for each operation - a counter operation that do the opposite, in a special log and when you need to rollback you actually run the commands that you've logged.
some databases have flashback technology that you can ask the DB to go back to certain date and time. but you need to understand how it works, and make sure it will only effect the data you want and not other staff...
Oracle Flashback
I don't think there is a similar technology on SQL server and there is a SO answer that says it doesn't, but SQL keeps evolving...
I'm not sure what technologies you're using here, but there are certainly better ways of going about this. You could start by storing the data in the session and committing when they are on the final page. Many frameworks these days also allow for long running transactions that span multiple requests.
However, you will probably want to commit at the end of every page and simply set some sort of flag for when the process is completed. That way if something goes wrong in the middle of it the user can recover and all is not lost.
Related
I have an application that uses LMDB. If multiple processes need to write to the database, only one will be allowed to run at a time and the rest block. Because of this, I want to rewrite the application to use a client-server model.
If the application is written to use a client-server model, the server can manage the writes and the other processes won't block. However, if a client encounters an error and has to roll back its transaction, how can it roll back its data without rolling back what the other clients have written?
I've looked at nested transactions, but write transactions may only have one nested transaction. So while a client can write its data to a nested transaction and roll it back if an error occurs, only one client will be able to run at a time. So while that solves the rollback problem, we're back to the problem that only one client can write at a time.
I've also taken a look at the MDB_NOLOCK option, which causes LMDB to not stop you from creating multiple write transactions. When you try to commit any transaction but the first one, it will return an error. Maybe the clients can pool their writes into their own transactions and when they're ready to commit, the server will dump the entries into the first write transaction, but that is hacky and I'm certain that is NOT what the developers intended it to be used for.
The only other solution I can think of is to keep clients in a separate database, which undoes the entire purpose of switching to a client-server model.
Are there any other ways to allow different processes to write to the database, while being able to roll back one client's data without rolling back everything?
There is no easy solution to the challenge you pose.
One can nest write transactions arbitrarily deep, but as you say that doesn't help you achieve better throughput because you'll still be limited to one thread for that write transaction. So for most intents and purposes, if you use LMDB as it was designed to be used, you're only going to have one thread doing write operations on the database.
You asked how you can abort someone's transaction without rolling back the txn of others. This is largely not an issue because, as mentioned above, you will have only one write transaction active at at time. If you commit at the end of the request and start a new transaction at the beginning of the next request, your write transactions will not overlap. Once you commit the first transaction, you'll not be able to abort it, but if your client informs the server to abort the request before it's committed, your server can abort it. And when it aborts, the database state will resort to the state it was in when that transaction was started. It will not lose any other transactions and not lose the changes created by other requests.
As you point out, you could batch some of the write requests in to a single write txn. You could process some of those write requests in other threads, but all LMDB API calls must still be done in the original thread. And if you try to dedicate a thread to each request to achieve some parallelism, you'll still need to ensure that the requests are mutually compatible and don't interfere with each other. And if one of those requests runs in to trouble, you'll have to abort the transaction and probably restart the transaction. When you restart the transactions, you'll probably only include the requests that did not run in to trouble. -- This is all possible, but only you have enough knowledge about your application to know if this would improve performance much and be worth your effort.
I recently came up with a case that makes me wonder if I'm a newbie or something trivial has escaped to me.
Suppose I have a software to be run by many users, that uses a table. When the user makes login in the app a series of information from the table appears and he has just to add and work or correct some information to save it. Now, if the software he uses is run by many people, how can I guarantee is he is the only one working with that particular record? I mean how can I know the record is not selected and being worked by 2 or more users at the same time? And please I wouldn't like the answer use “SELECT FOR UPDATE... “
because for what I've read it has too negative impact on the database. Thanks to all of you. Keep up the good work.
This is something that is not solved primarily by the database. The database manages isolation and locking of "concurrent transactions". But when the records are sent to the client, you usually (and hopefully) closed the transaction and start a new one when it comes back.
So you have to care yourself.
There are different approaches, the ones that come into my mind are:
optimistic locking strategies (first wins)
pessimistic locking strategies
last wins
Optimistic locking: you check whether a record had been changed in the meanwhile when storing. Usually it does this by having a version counter or timestamp. Some ORMs and frameworks may help a little to implement this.
Pessimistic locking: build a mechanism that stores the information that someone started to edit something and do not allow someone else to edit the same. Especially in web projects it needs a timeout when the lock is released anyway.
Last wins: the second person storing the record just overwrites the first changes.
... makes me wonder if I'm a newbie ...
That's what happens always when we discover that very common stuff is still not solved by the tools and frameworks we use and we have to solve it over and over again.
Now, if the software he uses is runed by many people how can I guarantee is he
is the only one working with that particular record.
Ah...
And please I wouldn't like the answer use “SELECT FOR UPDATE... “ because for
what I've read it has too negative impact on the database.
Who cares? I mean, it is the only way (keep a lock on a row) to guarantee you are the only one who can change it. Yes, this limits throughput, but then this is WHAT YOU WANT.
It is called programming - choosing the right tools for the job. IN this case impact is required because of the requirements.
The alternative - not a guarantee on the database but an application server - is an in memory or in database locking mechanism (like a table indicating what objects belong to what user).
But if you need to guarantee one record is only used by one person on db level, then you MUST keep a lock around and deal with the impact.
But seriously, most programs avoid this. They deal with it either with optimistic locking (second user submitting changes gets error) or other programmer level decisions BECAUSE the cost of such guarantees are ridiculously high.
Oracle is different from SQL server.
In Oracle, when you update a record or data set the old information is still available because your update is still on hold on the database buffer cache until commit.
Therefore who is reading the same record will be able to see the old result.
If the access to this record though is a write access, it will be a lock until commit, then you'll have access to write the same record.
Whenever the lock can't be resolved, a deadlock will pop up.
SQL server though doesn't have the ability to read a record that has been locked to write changes, therefore depending which query you're running, you might lock an entire table
First you need to separate queries and insert/updates using a data-warehouse database. Which means you could solve slow performance in update that causes locks.
The next step is to identify what is causing locks and work out each case separately.
rebuilding indexes during working hours could cause very nasty locks. Push them to after hours.
I have an asp.net-mvc website and people manage a list of projects. Based on some algorithm, I can tell if a project is out of date. When a user logs in, i want it to show the number of stale projects (similar to when i see a number of updates in the inbox).
The algorithm to calculate stale projects is kind of slow so if everytime a user logs in, i have to:
Run a query for all project where they are the owner
Run the IsStale() algorithm
Display the count where IsStale = true
My guess is that will be real slow. Also, on everything project write, i would have to recalculate the above to see if changed.
Another idea i had was to create a table and run a job everything minutes to calculate stale projects and store the latest count in this metrics table. Then just query that when users log in. The issue there is I still have to keep that table in sync and if it only recalcs once every minute, if people update projects, it won't change the value until after a minute.
Any idea for a fast, scalable way to support this inbox concept to alert users of number of items to review ??
The first step is always proper requirement analysis. Let's assume I'm a Project Manager. I log in to the system and it displays my only project as on time. A developer comes to my office an tells me there is a delay in his activity. I select the developer's activity and change its duration. The system still displays my project as on time, so I happily leave work.
How do you think I would feel if I receive a phone call at 3:00 AM from the client asking me for an explanation of why the project is no longer on time? Obviously, quite surprised, because the system didn't warn me in any way. Why did that happen? Because I had to wait 30 seconds (why not only 1 second?) for the next run of a scheduled job to update the project status.
That just can't be a solution. A warning must be sent immediately to the user, even if it takes 30 seconds to run the IsStale() process. Show the user a loading... image or anything else, but make sure the user has accurate data.
Now, regarding the implementation, nothing can be done to run away from the previous issue: you will have to run that process when something that affects some due date changes. However, what you can do is not unnecessarily run that process. For example, you mentioned that you could run it whenever the user logs in. What if 2 or more users log in and see the same project and don't change anything? It would be unnecessary to run the process twice.
Whatsmore, if you make sure the process is run when the user updates the project, you won't need to run the process at any other time. In conclusion, this schema has the following advantages and disadvantages compared to the "polling" solution:
Advantages
No scheduled job
No unneeded process runs (this is arguable because you could set a dirty flag on the project and only run it if it is true)
No unneeded queries of the dirty value
The user will always be informed of the current and real state of the project (which is by far, the most important item to address in any solution provided)
Disadvantages
If a user updates a project and then upates it again in a matter of seconds the process would be run twice (in the polling schema the process might not even be run once in that period, depending on the frequency it has been scheduled)
The user who updates the project will have to wait for the process to finish
Changing to how you implement the notification system in a similar way to StackOverflow, that's quite a different question. I guess you have a many-to-many relationship with users and projects. The simplest solution would be adding a single attribute to the relationship between those entities (the middle table):
Cardinalities: A user has many projects. A project has many users
That way when you run the process you should update each user's Has_pending_notifications with the new result. For example, if a user updates a project and it is no longer on time then you should set to true all users Has_pending_notifications field so that they're aware of the situation. Similarly, set it to false when the project is on time (I understand you just want to make sure the notifications are displayed when the project is no longer on time).
Taking StackOverflow's example, when a user reads a notification you should set the flag to false. Make sure you don't use timestamps to guess if a user has read a notification: logging in doesn't mean reading notifications.
Finally, if the notification itself is complex enough, you can move it away from the relationship between users and projects and go for something like this:
Cardinalities: A user has many projects. A project has many users. A user has many notifications. A notifications has one user. A project has many notifications. A notification has one project.
I hope something I've said has made sense, or give you some other better idea :)
You can do as follows:
To each user record add a datetime field sayng the last time the slow computation was done. Call it LastDate.
To each project add a boolean to say if it has to be listed. Call it: Selected
When you run the Slow procedure set you update the Selected fileds
Now when the user logs if LastDate is enough close to now you use the results of the last slow computation and just take all project with Selected true. Otherwise yourun again the slow computation.
The above procedure is optimal, becuase it re-compute the slow procedure ONLY IF ACTUALLY NEEDED, while running a procedure at fixed intervals of time...has the risk of wasting time because maybe the user will neber use the result of a computation.
Make a field "stale".
Run a SQL statement that updates stale=1 with all records where stale=0 AND (that algorithm returns true).
Then run a SQL statement that selects all records where stale=1.
The reason this will work fast is because SQL parsers, like PHP, shouldn't do the second half of the AND statement if the first half returns true, making it a very fast run through the whole list, checking all the records, trying to make them stale IF NOT already stale. If it's already stale, the algorithm won't be executed, saving you time. If it's not, the algorithm will be run to see if it's become stale, and then stale will be set to 1.
The second query then just returns all the stale records where stale=1.
You can do this:
In the database change the timestamp every time a project is accessed by the user.
When the user logs in, pull all their projects. Check the timestamp and compare it with with today's date, if it's older than n-days, add it to the stale list. I don't believe that comparing dates will result in any slow logic.
I think the fundamental questions need to be resolved before you think about databases and code. The primary of these is: "Why is IsStale() slow?"
From comments elsewhere it is clear that the concept that this is slow is non-negotiable. Is this computation out of your hands? Are the results resistant to caching? What level of change triggers the re-computation.
Having written scheduling systems in the past, there are two types of changes: those that can happen within the slack and those that cause cascading schedule changes. Likewise, there are two types of rebuilds: total and local. Total rebuilds are obvious; local rebuilds try to minimize "damage" to other scheduled resources.
Here is the crux of the matter: if you have total rebuild on every update, you could be looking at 30 minute lags from the time of the change to the time that the schedule is stable. (I'm basing this on my experience with an ERP system's rebuild time with a very complex workload).
If the reality of your system is that such tasks take 30 minutes, having a design goal of instant gratification for your users is contrary to the ground truth of the matter. However, you may be able to detect schedule inconsistency far faster than the rebuild. In that case you could show the user "schedule has been overrun, recomputing new end times" or something similar... but I suspect that if you have a lot of schedule changes being entered by different users at the same time the system would degrade into one continuous display of that notice. However, you at least gain the advantage that you could batch changes happening over a period of time for the next rebuild.
It is for this reason that most of the scheduling problems I have seen don't actually do real time re-computations. In the context of the ERP situation there is a schedule master who is responsible for the scheduling of the shop floor and any changes get funneled through them. The "master" schedule was regenerated prior to each shift (shifts were 12 hours, so twice a day) and during the shift delays were worked in via "local" modifications that did not shuffle the master schedule until the next 12 hour block.
In a much simpler situation (software design) the schedule was updated once a day in response to the day's progress reporting. Bad news was delivered during the next morning's scrum, along with the updated schedule.
Making a long story short, I'm thinking that perhaps this is an "unask the question" moment, where the assumption needs to be challenged. If the re-computation is large enough that continuous updates are impractical, then aligning expectations with reality is in order. Either the algorithm needs work (optimizing for local changes), the hardware farm needs expansion or the timing of expectations of "truth" needs to be recalibrated.
A more refined answer would frankly require more details than "just assume an expensive process" because the proper points of attack on that process are impossible to know.
I did some changes to a postgres table and I want to revert it back to a previous state. There is no back up of the database. Is there a way to do it? As in, does postgres take auto snap shots and store it somewhere or the original data is lost forever?
By default PostgreSQL won't store all of your old data -- that would surprise many people of course. However, it has a built-in Point-in-time Recovery mechanism which does pretty much exactly what you want. You have to keep an archive of "write-ahead log files" which represent the changes made to the database, and you can take periodic base backups. When you want to recover you can recovery to a specific time or even a specific transaction ID.
If you don't have a backup, and did an operation on data, I don't think there is much you can do : the database now has your new data, and the old version of it has be replaced/deleted.
The goal of a database engine is to persist the data you store in it -- not the data you removed from it.
For the next time : if you need to try something, use a transaction, and don't commit it until you are sure what you did is OK -- juste beware not to wait for too long before commitint or rollbacking it, because it might lock some stuff, preventing other people from doing queries too.
I do not have a way to test now so is it possible for you to confirm me the question of the title?
I mean in a ADO.NET database transaction, I can update/insert thousands of records before commiting to the database. In Active Directory using System.Directory.Services it seems I need to commit for every entry (or record) that I update/insert.
Thanks.
Active Directory is not a transactional store - so you don't have the transaction support like you have with a database.
Your observation is absolutely correct - with Active Directory, you deal on a per-object basis; you can retrieve an object, manipulate it, and then save back all the changes (or discard them) - but you don't have any transaction support to roll back a whole series of operations.
If you really must have this capability, you'd have to write your own Resource Manager for AD (see some ideas here in MSDN) - this would allow you to wrap your AD operations in a TransactionScope() and roll them back. I don't think this is a trivial undertaking, otherwise, someone would have done it already....
So your current observations are absolutely correct, and without a whole lot of effort, this cannot be changed, unfortunately.