Replicate a database using snapshots and transaction logs - database

For learning purposes, I want to write my own database, that is able to replicate itself. I have made some progress, but now I am facing a problem that I can not solve. Supposed I have a database (let's call this source) that I would like to replicate to another database (let's call this target).
The basic principle is easy: In the source you don't store actual tables, but instead a log of transactions. It's easy to send over the transaction log to the target, where the database then rebuilds itself. If you want to update the target, you simply request the part of the transaction log that has changed ever since. Basically this is what almost every database does.
While this works, it has one major drawback: If a table already exists for a long time, the transaction log is very long, and hence replicating the table requires lots of time…
To avoid this you can store the current state as well. This means you have an up-to-date snapshot that you can copy fast. Additionally, the target has to subscribe to the transaction log of the source. Once it contains additional entries, the target applies them to its copied table. This works well, too, and it's way better in terms of performance and transferred volume.
But now I am facing a problem: Supposed the snapshot is large, then it may happen that changes are made to it while it is being delivered. That means that the copied snapshot contains some old and some new data. Now, how do I get the target database in a consistent state? Even if I know from where to start the transaction log, I either have to apply a change that was already applied to some of the records, or I have to leave it out, but then a change is not applied at all to some other records.
Of course I could use the isolation level sequential, but then performance drops. Of course I could do what e.g. CouchDB does and remember the current table revision in every record, and keep a copy of every record for every revision. But then the required space grows enormously.
So, what shall I do?
Everything that I was able to find on the web always either relies on the idea of replaying the entire transaction log, or by using a process as in CouchDB which takes up huge amounts of space.
Any ideas?

Your snapshot needs to be consistent and you need to know at what time (in regards to the tx log) it is consistent. You then apply any transactions that have been committed since this point.
Obtaining a consistent snapshot can be done with exclusive locking, which may delay other transactions from committing, or using row versions (MVCC).
Good luck with your project.

Related

How does real-time collaborative applications saves the data?

I have previously done some very basic real-time applications using the help of sockets and have been reading more about it just for curiosity. One very interesting article I read was about Operational Transformation and I learned several new things. After reading it, I kept thinking of when or how this data is really saved to the database if I were to keep it. I have two assumptions/theories about what might be going on, but I'm not sure if they are correct and/or the best solutions to solve this issue. They are as follow:
(For this example lets assume it's a real-time collaborative whiteboard:)
For every edit that happens (ex. drawing a line), the socket will send a message to everyone collaborating. But at the same time, I will store the data in my database. The problem I see with this solution is the amount of time I would need to access the database. For every line a user draws, I would be required to access the database to store it.
Use polling. For this theory, I think of saving every data in temporal storage at the server, and then after 'x' amount of time, it will get all the data from the temporal storage and save them in the database. The issue for this theory is the possibility of a failure in the temporal storage (ex. electrical failure). If the temporal storage loses its data before it is saved in the database, then I would never be able to recover them again.
How do similar real-time collaborative applications like Google Doc, Slides, etc stores the data in their databases? Are they following one of the theories I mentioned or do they have a completely different way to store the data?
They prolly rely on logs of changes + latest document version + periodic snapshot (if they allow time traveling the document history).
It is similar to how most database's transaction system work. After validation the change is legit, the database writes the change in very fast data-structure on disk aka. the log that will only append the changed values. This log is replicated in-memory with a dedicated data-structure to speed up reads.
When a read comes in, the database will check the in-memory data-structure and merge the change with what is stored in the cache or on the disk.
Periodically, the changes that are present in memory and in the log, are merged with the data-structure on-disk.
So to summarize, in your case:
When an Operational Transformation comes to the server, two things happens:
It is stored in the database as is, to avoid any loss (equivalent of the log)
It updates an in-memory datastructure to be able to replay the change quickly in case an user request the latest version (equivalent of the memory datastructure)
When an user request the latest document, the server check the in-memory datastructre and replay the changes against the last stored consolidated document that might be lagging behind because of the following point
Periodically, the log is applied to the "last stored consolidated document" to reduce the amount of OT that must be replayed to produce the latest document.
Anyway, the best way to have a definitive answer is to look at open-source code that does what you are looking for, e.g. etherpad.

Simultaneous access to database; keeping data consistent across all connections

I made a database system right here:
(comments on the normalization are highly appreciated as well - I have a feeling you'll hate me on what I did with tblIsolateSensitivity; tblHAIFile only has a bunch of Boolean fields and foreign keys).
Let's say we have x number of terminals accessing the database. X1 edits Patient 01, X2 edits Patient 02, X3 deletes Patient 01 at the same time. How can I ensure that the data between the three terminals are all up-to-date and consistent?
At the moment, I am querying the data only when the query is needed to be done (ie: when the user searches for a record, or if the program needs to verify something against a database record), meaning the data is only as updated as the most recent query that the user makes. This makes it difficult to ensure that the data is up-to-date on all terminals. Of course, for deleted entries, I have error handling to handle that, but for the rest, well...
So, my question is: how do you guys typically handle this kind of situation? Is there a name for this concept so that I can look it up and read long?
From a database design perspective, you should read up on optimistic concurrency and pessimistic concurrency. These are two options for making sure that you either don't have two users modifying the same record at the same time, or at least if you do allow that, the conflict is detected so it can be resolved.
The basic idea behind optimistic concurrency is that you allow multiple users to view and modify the data at the same time, on the assumption that this will be relatively rare. However, before any user writes changes to the data, a check is made to ensure that the underlying data hasn't changed since it was originally read. In some cases you do this manually with a read before update, checking each column value against a cached value. However, that is cumbersome. Some DBMS systems have features that make this simpler. For example, SQL Server has the ROWVERSION (formerly known as TIMESTAMP) data type, which lets you check easily using a single value whether someone else has changed a record since the last time you read it.
The basic idea behind pessimistic concurrency is that you put a lock on a record in the expectation that you're going to change it. While you hold the lock, the DBMS will prevent anyone else from getting their own lock.
The advantage of optimistic concurrency is that it's pretty light weight, doesn't interfere too much with your application, and let's you manually (or automatically) resolve any conflicts on those rare occasions when they happen. You also don't have to worry about someone reading a record, locking it and then going home for the weekend.
The advantage of pessimistic concurrency is that it prevents collisions, but it can stop one user from working while they wait for another to finish what they're doing.
From the perspective of notifying users when records change in the background (i.e. they're changed by another user) that isn't a database design feature. It may be a feature of your application logic or of your application's data access layer.

Concurrent editing of same data

I recently came up with a case that makes me wonder if I'm a newbie or something trivial has escaped to me.
Suppose I have a software to be run by many users, that uses a table. When the user makes login in the app a series of information from the table appears and he has just to add and work or correct some information to save it. Now, if the software he uses is run by many people, how can I guarantee is he is the only one working with that particular record? I mean how can I know the record is not selected and being worked by 2 or more users at the same time? And please I wouldn't like the answer use “SELECT FOR UPDATE... “
because for what I've read it has too negative impact on the database. Thanks to all of you. Keep up the good work.
This is something that is not solved primarily by the database. The database manages isolation and locking of "concurrent transactions". But when the records are sent to the client, you usually (and hopefully) closed the transaction and start a new one when it comes back.
So you have to care yourself.
There are different approaches, the ones that come into my mind are:
optimistic locking strategies (first wins)
pessimistic locking strategies
last wins
Optimistic locking: you check whether a record had been changed in the meanwhile when storing. Usually it does this by having a version counter or timestamp. Some ORMs and frameworks may help a little to implement this.
Pessimistic locking: build a mechanism that stores the information that someone started to edit something and do not allow someone else to edit the same. Especially in web projects it needs a timeout when the lock is released anyway.
Last wins: the second person storing the record just overwrites the first changes.
... makes me wonder if I'm a newbie ...
That's what happens always when we discover that very common stuff is still not solved by the tools and frameworks we use and we have to solve it over and over again.
Now, if the software he uses is runed by many people how can I guarantee is he
is the only one working with that particular record.
Ah...
And please I wouldn't like the answer use “SELECT FOR UPDATE... “ because for
what I've read it has too negative impact on the database.
Who cares? I mean, it is the only way (keep a lock on a row) to guarantee you are the only one who can change it. Yes, this limits throughput, but then this is WHAT YOU WANT.
It is called programming - choosing the right tools for the job. IN this case impact is required because of the requirements.
The alternative - not a guarantee on the database but an application server - is an in memory or in database locking mechanism (like a table indicating what objects belong to what user).
But if you need to guarantee one record is only used by one person on db level, then you MUST keep a lock around and deal with the impact.
But seriously, most programs avoid this. They deal with it either with optimistic locking (second user submitting changes gets error) or other programmer level decisions BECAUSE the cost of such guarantees are ridiculously high.
Oracle is different from SQL server.
In Oracle, when you update a record or data set the old information is still available because your update is still on hold on the database buffer cache until commit.
Therefore who is reading the same record will be able to see the old result.
If the access to this record though is a write access, it will be a lock until commit, then you'll have access to write the same record.
Whenever the lock can't be resolved, a deadlock will pop up.
SQL server though doesn't have the ability to read a record that has been locked to write changes, therefore depending which query you're running, you might lock an entire table
First you need to separate queries and insert/updates using a data-warehouse database. Which means you could solve slow performance in update that causes locks.
The next step is to identify what is causing locks and work out each case separately.
rebuilding indexes during working hours could cause very nasty locks. Push them to after hours.

Keeping handsets in sync with a database

On the system I am developing, we have a PostgreSQL database that contains set up data which when updated must be transfered to handsets when and while those handsets are "docked". While the handsets are docked, our "service software" can talk to them, but not while they are undocked (they are not wireless).
At the moment, the service software that the handsets talk to loads the set up data from the database on startup and caches it. Thereafter it queries the latest timestamp of the setup data every 5 seconds and reloads parts of the set up if the timestamp queried is higher than the latest cached timestamp.
However, I find this method haphazard. It may be possible to miss an update for instance if an update transaction takes longer than a second, or at least if the period between submitting the transaction and completion of the transaction takes it over a 1-second boundary (the now() function is resolved at the beginning of the transaction by PostgreSQL). The only way I can think of round that is to do a table level lock before querying the latest timestamp. I'm not a fan of table locks but it is the only way I can think of to get round the problem.
Another problem with this approach is, I have to query for new data based on the update timetamp being >= the last latest timestamp, as opposed to just > the last latest timestamp. Why? Because a record may of been committed within the same second, just after my query - so I'd miss the record.
Another approach I've thought of is, storing "last synchronised date-time" data in the database for each logical item of data that must be stored on the handsets. I would do this on a per handset basis. I can then periodically query for all data not currently synchronised on a particular handset, and then mark it as synchronised once the handset is up to date (I have worked out a mechanism for this to be failsafe which takes into account the data being updated during synchronisation).
My only problem with this approach is that it means the database is storing non-business centric data - as in it is storing data to make the system work. I'm not convinced data about what handsets are in sync is "business" data. To me it is more the responsibility of the handset service software / handset software to know how to keep itself up to date, though it is tempting as it describes perfectly what data is and is not on each handset and allows queries to only return the data needed.
The first approach however at least only uses data appropriate to the business - i.e. the timestamp of when the data was last changed.
The ideal way would be to use some kind of notification system, but unfortunately postgres only has a basic NOTIFY / EVENT system and that doesn't seem to work over ODBC (which I foolishly decided to use and do not have time to change just now). If I was using Oracle I could use Streams..
Thoughts?
Note: The database is purely relational - I am not interested in any "object oriented" approaches to this problem or any framework based solutions.
Thanks.
First of all, if you are using PostgreSQL version at least 7.2, the now function returns values with microsecond precision rather that second precision; although the value is ultimately derived from the operating system and will be accurate only up to a few hundredths of a second.
The method that you describe appears to be safe against permanently missing any updates. Just make sure that you reload data every time unless the timestamps prove that you have reloaded long enough after the last update. Alternately, you could update a timestamp upon data update in a separate transaction; in that case, ever seeing such a timestamp is a proof that all updates had finished before the timestamp value.
Another approach I've thought of is, storing "last synchronised
date-time" data in the database for each logical item of data that
must be stored on the handsets. I would do this on a per handset
basis. I can then periodically query for all data not currently
synchronised on a particular handset, and then mark it as synchronised
once the handset is up to date
I can not recommend this for the following reasons:
As synchronization is a state of a handset and not a state of the data, this information should better be stored on the handset.
The database should be scalable to many handsets and it ideally should not have to keep track of them.
If a handset can change its identity, or be wiped or restored (reimaged) to a previous state without changing its identity, the database will get out of sync with the real state of the headset and no mechanism will ensure proper synchronization.
While NOTIFY is certainly preferable to constant polling, it is a problem orthogonal to where you store the synchronization progress. You still need to have a polling capability to be able to deal with a freshly connected devide, and notifications would be just a bandwidth/latency optimization.

Versioning a dataset in an RDBMS using initials and deltas

I'm working on a system that mirrors remote datasets using initials and deltas. When an initial comes in, it mass deletes anything preexisting and mass inserts the fresh data. When a delta comes in, the system does a bunch of work to translate it into updates, inserts, and deletes. Initials and deltas are processed inside long transactions to maintain data integrity.
Unfortunately the current solution isn't scaling very well. The transactions are so large and long running that our RDBMS bogs down with various contention problems. Also, there isn't a good audit trail for how the deltas are applied, making it difficult to troubleshoot issues causing the local and remote versions of the dataset to get out of sync.
One idea is to not run the initials and deltas in transactions at all, and instead to attach a version number to each record indicating which delta or initial it came from. Once an initial or delta is successfully loaded, the application can be alerted that a new version of the dataset is available.
This just leaves the issue of how exactly to compose a view of a dataset up to a given version from the initial and deltas. (Apple's TimeMachine does something similar, using hard links on the file system to create "view" of a certain point in time.)
Does anyone have experience solving this kind of problem or implementing this particular solution?
Thanks!
have one writer and several reader databases. You send the write to the one database, and have it propagate the exact same changes to all the other databases. The reader databases will be eventually consistent and the time to update is very fast. I have seen this done in environments that get upwards of 1M page views per day. It is very scalable. You can even put a hardware router in front of all the read databases to load balance them.
Thanks to those who tried.
For anyone else who ends up here, I'm benchmarking a solution that adds a "dataset_version_id" and "dataset_version_verb" column to each table in question. A correlated subquery inside a stored procedure is then used to retrieve the current dataset_version_id when retrieving specific records. If the latest version of the record has dataset_version_verb of "delete", it's filtered out of the results by a WHERE clause.
This approach has an average ~ 80% performance hit so far, which may be acceptable for our purposes.

Resources