Postgres reverting back - database

I did some changes to a postgres table and I want to revert it back to a previous state. There is no back up of the database. Is there a way to do it? As in, does postgres take auto snap shots and store it somewhere or the original data is lost forever?

By default PostgreSQL won't store all of your old data -- that would surprise many people of course. However, it has a built-in Point-in-time Recovery mechanism which does pretty much exactly what you want. You have to keep an archive of "write-ahead log files" which represent the changes made to the database, and you can take periodic base backups. When you want to recover you can recovery to a specific time or even a specific transaction ID.

If you don't have a backup, and did an operation on data, I don't think there is much you can do : the database now has your new data, and the old version of it has be replaced/deleted.
The goal of a database engine is to persist the data you store in it -- not the data you removed from it.
For the next time : if you need to try something, use a transaction, and don't commit it until you are sure what you did is OK -- juste beware not to wait for too long before commitint or rollbacking it, because it might lock some stuff, preventing other people from doing queries too.

Related

How does real-time collaborative applications saves the data?

I have previously done some very basic real-time applications using the help of sockets and have been reading more about it just for curiosity. One very interesting article I read was about Operational Transformation and I learned several new things. After reading it, I kept thinking of when or how this data is really saved to the database if I were to keep it. I have two assumptions/theories about what might be going on, but I'm not sure if they are correct and/or the best solutions to solve this issue. They are as follow:
(For this example lets assume it's a real-time collaborative whiteboard:)
For every edit that happens (ex. drawing a line), the socket will send a message to everyone collaborating. But at the same time, I will store the data in my database. The problem I see with this solution is the amount of time I would need to access the database. For every line a user draws, I would be required to access the database to store it.
Use polling. For this theory, I think of saving every data in temporal storage at the server, and then after 'x' amount of time, it will get all the data from the temporal storage and save them in the database. The issue for this theory is the possibility of a failure in the temporal storage (ex. electrical failure). If the temporal storage loses its data before it is saved in the database, then I would never be able to recover them again.
How do similar real-time collaborative applications like Google Doc, Slides, etc stores the data in their databases? Are they following one of the theories I mentioned or do they have a completely different way to store the data?
They prolly rely on logs of changes + latest document version + periodic snapshot (if they allow time traveling the document history).
It is similar to how most database's transaction system work. After validation the change is legit, the database writes the change in very fast data-structure on disk aka. the log that will only append the changed values. This log is replicated in-memory with a dedicated data-structure to speed up reads.
When a read comes in, the database will check the in-memory data-structure and merge the change with what is stored in the cache or on the disk.
Periodically, the changes that are present in memory and in the log, are merged with the data-structure on-disk.
So to summarize, in your case:
When an Operational Transformation comes to the server, two things happens:
It is stored in the database as is, to avoid any loss (equivalent of the log)
It updates an in-memory datastructure to be able to replay the change quickly in case an user request the latest version (equivalent of the memory datastructure)
When an user request the latest document, the server check the in-memory datastructre and replay the changes against the last stored consolidated document that might be lagging behind because of the following point
Periodically, the log is applied to the "last stored consolidated document" to reduce the amount of OT that must be replayed to produce the latest document.
Anyway, the best way to have a definitive answer is to look at open-source code that does what you are looking for, e.g. etherpad.

Replicate a database using snapshots and transaction logs

For learning purposes, I want to write my own database, that is able to replicate itself. I have made some progress, but now I am facing a problem that I can not solve. Supposed I have a database (let's call this source) that I would like to replicate to another database (let's call this target).
The basic principle is easy: In the source you don't store actual tables, but instead a log of transactions. It's easy to send over the transaction log to the target, where the database then rebuilds itself. If you want to update the target, you simply request the part of the transaction log that has changed ever since. Basically this is what almost every database does.
While this works, it has one major drawback: If a table already exists for a long time, the transaction log is very long, and hence replicating the table requires lots of timeā€¦
To avoid this you can store the current state as well. This means you have an up-to-date snapshot that you can copy fast. Additionally, the target has to subscribe to the transaction log of the source. Once it contains additional entries, the target applies them to its copied table. This works well, too, and it's way better in terms of performance and transferred volume.
But now I am facing a problem: Supposed the snapshot is large, then it may happen that changes are made to it while it is being delivered. That means that the copied snapshot contains some old and some new data. Now, how do I get the target database in a consistent state? Even if I know from where to start the transaction log, I either have to apply a change that was already applied to some of the records, or I have to leave it out, but then a change is not applied at all to some other records.
Of course I could use the isolation level sequential, but then performance drops. Of course I could do what e.g. CouchDB does and remember the current table revision in every record, and keep a copy of every record for every revision. But then the required space grows enormously.
So, what shall I do?
Everything that I was able to find on the web always either relies on the idea of replaying the entire transaction log, or by using a process as in CouchDB which takes up huge amounts of space.
Any ideas?
Your snapshot needs to be consistent and you need to know at what time (in regards to the tx log) it is consistent. You then apply any transactions that have been committed since this point.
Obtaining a consistent snapshot can be done with exclusive locking, which may delay other transactions from committing, or using row versions (MVCC).
Good luck with your project.

How to use CHECKPOINT efficently in a application that uses FILESTREAM

I use FILESTREAM to store BLOBS in my client server application.
In past i had from time to time to clear all BLOBS by executing a command like:
UPDATE BLOBTABLE set BLOBFIELD = NULL
This clears the blobs, i did this to make the DB backup smaller.
But to make the blobs "disappear from disk" i then need to run
CHECKPOINT
Note: this was done as DBA activity, not as part of the software.
Now I realize that in my application I never call CHECKPOINT.
May be i should every time i delete a blob, should i?
Just to experss my self better i make an example of my real case:
My app allows to store files (like pdf documents).
those pdf are saved as blobs in a filestream field.
As the user deletes them (from the ui) I run a DELETE commmand.
I do not call CEHCKPOINT after it, so the garbage collection does not
run.
By considering this i realize that i do not have the full thing under control.
So my question is simply: do i need to run CHECKPOINT every time i delete one of those files? is there any drawback in doing this?
Thanks!
A database performs checkpoints in different moments, one of those is when backup is performed.
Since the checkpoint triggers the garbage collection it is not needed (exceptions could be huge or complex scenarios) to call CHECKPOINT in an application because the risk is to reduce the performance.
Better decide to use CHECKPOINT as a maintenance activity if needed, but keeping in mind that a database backup (or even stopping the sql service) has a checkpoint as consequence.

Does Active Directory commitChanges method work the same as a DataBase Commit transaction?

I do not have a way to test now so is it possible for you to confirm me the question of the title?
I mean in a ADO.NET database transaction, I can update/insert thousands of records before commiting to the database. In Active Directory using System.Directory.Services it seems I need to commit for every entry (or record) that I update/insert.
Thanks.
Active Directory is not a transactional store - so you don't have the transaction support like you have with a database.
Your observation is absolutely correct - with Active Directory, you deal on a per-object basis; you can retrieve an object, manipulate it, and then save back all the changes (or discard them) - but you don't have any transaction support to roll back a whole series of operations.
If you really must have this capability, you'd have to write your own Resource Manager for AD (see some ideas here in MSDN) - this would allow you to wrap your AD operations in a TransactionScope() and roll them back. I don't think this is a trivial undertaking, otherwise, someone would have done it already....
So your current observations are absolutely correct, and without a whole lot of effort, this cannot be changed, unfortunately.

How do I rollback a transaction that has already been committed?

I am implementing an undo button for a lengthy operation on a web app. Since the undo will come in another request, I have to commit the operation.
Is there a way to issue a hint on the transaction like "maybe rollback"? So after the transaction is committed, I could still rollback in another processes if needed.
Otherwise the Undo function will be as complex as the operation it is undoing.
Is this possible? Other ideas welcome!
Another option you might want to consider:
If it's possible for the user to 'undo' the operation, you may want to implement an 'intent' table where you can store pending operations. Once you go through your application flow, the user would need to Accept or Undo the operation, at which point you can just run the pending transaction and apply it to your database.
We have a similar system in place on our web application, where a user can submit a transaction for processing and has until 5pm on the day it's scheduled to run to cancel it. We store this in an intent table and process any transactions scheduled for that day after the daily cutoff time. In your case you would need an explicit 'Accept' or 'Undo' operation from the user after the initial 'lengthy operation', so that would change your process a little bit.
Hope this helps.
The idea in this case is to log for each operation - a counter operation that do the opposite, in a special log and when you need to rollback you actually run the commands that you've logged.
some databases have flashback technology that you can ask the DB to go back to certain date and time. but you need to understand how it works, and make sure it will only effect the data you want and not other staff...
Oracle Flashback
I don't think there is a similar technology on SQL server and there is a SO answer that says it doesn't, but SQL keeps evolving...
I'm not sure what technologies you're using here, but there are certainly better ways of going about this. You could start by storing the data in the session and committing when they are on the final page. Many frameworks these days also allow for long running transactions that span multiple requests.
However, you will probably want to commit at the end of every page and simply set some sort of flag for when the process is completed. That way if something goes wrong in the middle of it the user can recover and all is not lost.

Resources