Does anyone know which file system semantics current database systems (e.g. mysql) need. I searched throughout the net and found that the BerkleyDB you can read, that every file system with POSIX semantics can be used. But I wonder if true POSIX semantics are really needed or if a subset is sufficient.
Any hint will be appreciated.
One way to look for your answer is to search for answers to questions such as "running BerkeleyDB out of NFS". Since NFS is very common, but has relaxed semantics, these answers have surely been asked.
It might not be a complete answer, but the section 2.0 of the article "Atomic Commit In SQLite" discusses the assumptions on the underlying storage.
Related
I'm trying to replace GDBM in an application with a better key-value storage manager, and one of my objectives is to use the same database file across different architecture platforms. This, in particular, means it should be independent of the endian-ness and whether the architecture is 32-bit or 64-bit.
Does anyone know if either Tkrzw or LevelDB satisfy this? Or any other key-value DBMs?
Since Tkrzw is a new library, I asked the developer on Github, and according to them it is indeed independent (source).
I also asked the authors of LevelDB, and the answer was also positive (source)
Is there any documentation about what (and how are performed) are the optimizations that are performed when transforming the Application DAG into the physical DAG?
This is the "official" page of Flink Optimizer but it is awfully incomplete.
I've also found other information directly in the source-code on the official github page but it is very difficult to read and understand.
Other information can be found here (section 6) (Stratosphere was the previous name of Flink before it was incubated by Apache) and here (section 5.1) for a more theoretical documentation.
I hope it will be documented better in the future :-)
As i understand in distributed system we are supposed to handle network partition failure which is solved by using multiple copies of the same data.
Is this the only place where we use consensus algorithm?
What is the difference between 2PC/3PC/Paxos (is paxos modified version of 3PC? if so then 2PC/3PC , PC also kind of consensus algorithm?)
Network partition is not "solved" by having many copies of the same data. Although redundancy is ofcourse essential to deal with any sort of failures :)
There are many other problems regarding network partition. Generally to increase tolerance for network partitions you use an algorithm that relies on a quorum rather than total-communication, in the quroum approach you can still make progress on one side of the partition as long as f+1 nodes out of 2f are reachable. Paxos for example uses quorum approach. It is quite clear that a protocol like 2PC cannot make progress in case of any type of network partition since it requires "votes" from all of the nodes.
What is the difference between 2PC/3PC/Paxos (is paxos modified version of 3PC? if so then 2PC/3PC , PC also kind of consensus algorithm?)
2PC/3PC/Paxos are all variants of consensus-protocols, although 2PC and 3PC are often described as handling the more specific scenario of "atomic commit in a distributed system" which essentially is a consensus problem. 2PC, 3PC, Paxos are similar but different. You can easily find detailed information about each algorithm on the web.
Is this the only place where we use consensus algorithm?
Consensus protocols have many many use cases in distributed systems, for example: atomic commit, atomic broadcast, leader election or basicly any algorithm that require a set of processes to agree upon some value or action.
Caveat: consensus protocols and related problems of distributed systems are not trivial and you'll need to do some reading to get a deep understanding. If you are comfortable reading academic papers you can find most of the famous ones available online, for example "Paxos made simple" by Leslie Lamport or you can find good blogposts by googling. Also the wiki-article for paxos is very good quality in my opinion!
Hope that could answer some of your questions although I probably introduced you to even more! (if your interested you have some research to do).
Could anyone explain the term "Data compression" in database (lame words). Sorry if this question is simple, but it does help me.
I did find the technical definition, but still did not get a right understanding.
Data compression saves space and saves reading times and so on. Does that mean its aggregating data in the table? Please clarify
Since you've tagged this post with db2, then I'm assuming your asking about compression within the database. DB2 does dictionary compression – it replaces common strings with shorter tokens on the actual data pages, reducing the size of the table.
Please see: The wikipedia article on Dictionary Coder for a general discussion on how this algorithm works.
If you're using DB2 for Linux, UNIX and Windows you can read this developerWorks article that describes the compression specifically in DB2. The article is a few years old but it holds true today (even though there have been many enhancements beyond the initial release).
According to the Wikipedia article object relational mapping:
is a programming technique for
converting data between incompatible
type systems in relational databases
and object-oriented programming
languages
I thought an ORM also took care of tranferring the data between the application and the database. Is that not necessarily true?
EDIT: After reading the answers I don't know if it's possible to choose a definitively correct answer to this question since perhaps it is subjective to some degree. On the one hand it is true that the ORM per se may not perform the transfer of data but rather JDBC or some other similar technology. On the other hand, the ORM is the actor that is responsible for delegating this task to JDBC and for that reason can be thought of as being "in charge" of the transfer.
The article is referring to the concept of object relational mapping, rather than any software implementation of it, such as Hibernate, which indeed does what you mentioned (possibly delegating the job to other mechanisms).
Either way, it's a collaborative encyclopaedia, so you can always edit that article if you think you can make it more clear.
The transfer of data is typically handled by a lower-level mechanism such as JDBC in Java.