Why crash recovery are much simpler for Append-only database? - database

In Book "Designing Data Intensive Applications", it says Concurrency and crash recovery are simpler if segment files are append-only or immutable. For crash recovery, you don't need to worry if a crash happened while a value was being overwritten, leaving you with part of old data and part of new data.
However, I didn't understand "leaving you with part of old data and part of new data". Even for Append-only database, it will have part of old data and part of new data when appending new data is interrupted.

Databases report "data is committed" after the data is written to a log. In append only databases, the log is the only place to write data to; there are no other data files to update, hence no complexity.
When an append only database is loaded, the only thing which is happening is log file is read to build some in-memory structures for optimization. It does not matter if this load is happening because of a crash or a regular box restart.
Impaler gave a good description of complexity with non append only databases. There are multiple data structures and it has to be a process to keep them in sync after a crash.

Related

What are the major differences between snapshot, dump, mirror, backup, archive, and checkpoint?

I am new to the SDDC (software-defined data center), and I found these concepts over the Internet are at most vague.
Particularly, the last three concepts differ trivially, and to make things worse, people sometimes use them interchangeably. What are their major differences? I also read this post but the explanation seems still not enough to answer my question.
Archiving is the process of moving data that is no longer
actively used as a separate storage device for long-term retention.
Dumping is a major output of data that can help users either back up or duplicate a database.
Mirroring refers to the real-time operation of copying data, as an exact copy, from one location to a local or remote storage medium.
Snapshotting is the state of a system at a particular point in time.
Backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event.
Checkpoint is a test operation that verifies data retrieved from the database by comparing that data with the baseline copy stored in your project.

How does real-time collaborative applications saves the data?

I have previously done some very basic real-time applications using the help of sockets and have been reading more about it just for curiosity. One very interesting article I read was about Operational Transformation and I learned several new things. After reading it, I kept thinking of when or how this data is really saved to the database if I were to keep it. I have two assumptions/theories about what might be going on, but I'm not sure if they are correct and/or the best solutions to solve this issue. They are as follow:
(For this example lets assume it's a real-time collaborative whiteboard:)
For every edit that happens (ex. drawing a line), the socket will send a message to everyone collaborating. But at the same time, I will store the data in my database. The problem I see with this solution is the amount of time I would need to access the database. For every line a user draws, I would be required to access the database to store it.
Use polling. For this theory, I think of saving every data in temporal storage at the server, and then after 'x' amount of time, it will get all the data from the temporal storage and save them in the database. The issue for this theory is the possibility of a failure in the temporal storage (ex. electrical failure). If the temporal storage loses its data before it is saved in the database, then I would never be able to recover them again.
How do similar real-time collaborative applications like Google Doc, Slides, etc stores the data in their databases? Are they following one of the theories I mentioned or do they have a completely different way to store the data?
They prolly rely on logs of changes + latest document version + periodic snapshot (if they allow time traveling the document history).
It is similar to how most database's transaction system work. After validation the change is legit, the database writes the change in very fast data-structure on disk aka. the log that will only append the changed values. This log is replicated in-memory with a dedicated data-structure to speed up reads.
When a read comes in, the database will check the in-memory data-structure and merge the change with what is stored in the cache or on the disk.
Periodically, the changes that are present in memory and in the log, are merged with the data-structure on-disk.
So to summarize, in your case:
When an Operational Transformation comes to the server, two things happens:
It is stored in the database as is, to avoid any loss (equivalent of the log)
It updates an in-memory datastructure to be able to replay the change quickly in case an user request the latest version (equivalent of the memory datastructure)
When an user request the latest document, the server check the in-memory datastructre and replay the changes against the last stored consolidated document that might be lagging behind because of the following point
Periodically, the log is applied to the "last stored consolidated document" to reduce the amount of OT that must be replayed to produce the latest document.
Anyway, the best way to have a definitive answer is to look at open-source code that does what you are looking for, e.g. etherpad.

How to use CHECKPOINT efficently in a application that uses FILESTREAM

I use FILESTREAM to store BLOBS in my client server application.
In past i had from time to time to clear all BLOBS by executing a command like:
UPDATE BLOBTABLE set BLOBFIELD = NULL
This clears the blobs, i did this to make the DB backup smaller.
But to make the blobs "disappear from disk" i then need to run
CHECKPOINT
Note: this was done as DBA activity, not as part of the software.
Now I realize that in my application I never call CHECKPOINT.
May be i should every time i delete a blob, should i?
Just to experss my self better i make an example of my real case:
My app allows to store files (like pdf documents).
those pdf are saved as blobs in a filestream field.
As the user deletes them (from the ui) I run a DELETE commmand.
I do not call CEHCKPOINT after it, so the garbage collection does not
run.
By considering this i realize that i do not have the full thing under control.
So my question is simply: do i need to run CHECKPOINT every time i delete one of those files? is there any drawback in doing this?
Thanks!
A database performs checkpoints in different moments, one of those is when backup is performed.
Since the checkpoint triggers the garbage collection it is not needed (exceptions could be huge or complex scenarios) to call CHECKPOINT in an application because the risk is to reduce the performance.
Better decide to use CHECKPOINT as a maintenance activity if needed, but keeping in mind that a database backup (or even stopping the sql service) has a checkpoint as consequence.

What (kind of) data is lost when using REPAIR_ALLOW_DATA_LOSS?

after having some trouble with my sql server 2012 I could only fix the data inconsistencies using DBCC CHECKDB (xxx, REPAIR_ALLOW_DATA_LOSS). The option's name implies, that there will (possibly) be a data loss, when the database is repaired.
What is the data that is lost and how harmfull is the loss?
For example, take a look at this log message:
The off-row data node at page (1:705), slot 0, text ID 328867287793664 is not referenced.
If that node is not referenced and it is this node, that causes the inconsistency, delete it. That shouldn't hurt anyone. Is that the kind of data loss MS is talking about?
Best regards,
Sascha
Check Paul Randal's blog post for some additional insight into the implications of running DBCC CHECKDB with REPOAIR_ALLOW_DATA_LOSS:
REPAIR_ALLOW_DATA_LOSS is the repair level that DBCC CHECKDB recommends when it finds corruptions. This is because fixing nearly
anything that’s not a minor non-clustered index issue requires
deleting something to repair it. So, REPAIR_ALLOW_DATA_LOSS will
delete things. This means it will probably delete some of your data as
well. If, for instance it finds a corrupt record on a data page, it
may end up having to delete the entire data page, including all the
other records on the page, to fix the corruption. That could be a lot
of data. For this reason, the repair level name was carefully chosen.
You can’t type in REPAIR_ALLOW_DATA_LOSS without realizing that you’re
probably going to lose some data as part of the operation.
I’ve been asked why this is. The purpose of repair is not to save user
data. The purpose of repair is to make the database structurally
consistent as fast as possible (to limit downtime) and correctly (to
avoid making things worse). This means that repairs have to be
engineered to be fast and reliable operations that will work in every
circumstance. The simple way to do this is to delete what’s broken and
fix up everything that linked to (or was linked from) the thing being
deleted – whether a record or page. Trying to do anything more
complicated would increase the chances of the repair not working, or
even making things worse.
The ramifications of this are that running REPAIR_ALLOW_DATA_LOSS can
lead to the same effect on your data as rebuilding a transaction log
with in-flight transactions altering user data – your business logic,
inherent and constraint-enforced relationships between tables, and the
basic logical integrity of your user data could all be broken. BUT,
the database is now structurally consistent and SQL Server can run on
it without fear of hitting a corruption that could cause a crash.
To continue the contrived example from above, imagine your bank
checking and savings accounts just happen to be stored on the same
data page in the bank’s SQL Server database. The new DBA doesn’t
realize that backups are necessary for disaster recovery and data
preservation and so isn’t taking any. Disaster strikes again in the
form of the work crew outside the data-center accidentally cutting the
power and the machine hosting SQL Server powers down. This time, one
of the drives has a problem while powering down and a page write
doesn’t complete – causing a torn page. Unfortunately, it’s the page
holding your bank accounts. As the DBA doesn’t have any backups, the
only alternative to fix the torn-page is to run
REPAIR_ALLOW_DATA_LOSS. For this error, it will delete the page, and
does so. In the process, everything else on the page is also lost,
including your bank accounts!!
Source: Corruption: Last resorts that people try first…

what is faster database querys or file writing/reading

I know that in normal cases is faster to read/write from a file, but if I created a chat system:
Would it be faster to write and read from a file or to insert/select data in a db and cahe results?
Database is faster. AND importantly for you, deals with concurrent access.
Do you really want a mechanical disk action every time someone types? Writing to disk is a horrible idea. Cache messages in memory. Clear the message once it is sent to all users in the room. The cache will stay small, most of the time empty. This is your best option if you don't need a history log.
But if you need a log....
If you write a large amount of data in 1 pass, I guarantee the file will smoke database insert performance. A bulk insert feature of the database may match the file, but it requires a file data source to begin with. You would need to queue up a lot of messages in memory, then periodically flush to the file.
For many small writes the gap will close and the database will pull ahead. Indexes will influence the insert speed. If thousands of users are inserting to a heavily indexed table you may have problems.
Do your own tests to prove what is faster. Simulate a realistic load, not a 1 user test.
Databases by far.
Databases are optimized for data storage which is constantly updated and changed as in your case. File storage is for long-term storage with few changes.
(even if files were faster I would still go with databases because it's easier to develop and maintain)
Since I presume your system would write/read data continuously (as people type their messages), writing them to a file would take longer time because of the file handling procedure, i.e.
open file for writing
lock file
write & save
unlock file
I would go with db.

Resources