What is Multiversion Concurrency Control (MVCC) and who supports it? [closed] - database

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Recently Jeff has posted regarding his trouble with database deadlocks related to reading. Multiversion Concurrency Control (MVCC) claims to solve this problem. What is it, and what databases support it?
updated: these support it (which others?)
oracle
postgresql

Oracle has had an excellent multi version control system in place since very long(at least since oracle 8.0)
Following should help.
User A starts a transaction and is updating 1000 rows with some value At Time T1
User B reads the same 1000 rows at time T2.
User A updates row 543 with value Y (original value X)
User B reaches row 543 and finds that a transaction is in operation since Time T1.
The database returns the unmodified record from the Logs. The returned value is the value that was committed at the time less than or equal to T2.
If the record could not be retreived from the redo logs it means the database is not setup appropriately. There needs to be more space allocated to the logs.
This way the read consitency is achieved. The returned results are always the same with respect to the start time of transaction. So within a transaction the read consistency is achieved.
I have tried to explain in the simplest terms possible...there is a lot to multiversioning in databases.

PostgreSQL's Multi-Version Concurrency Control
As well as this article which features diagrams of how MVCC works when issuing INSERT, UPDATE, and DELETE statements.

The following have an implementation of MVCC:
SQL Server 2005 (Non-default, SET READ_COMMITTED_SNAPSHOT ON)
http://msdn.microsoft.com/en-us/library/ms345124.aspx
Oracle (since version 8)
MySQL 5 (only with InnoDB tables)
PostgreSQL
Firebird
Informix
I'm pretty sure Sybase and IBM DB2 Mainframe/LUW do not have an implementation of MVCC

Firebird does it, they call it MGA (Multi Generational Architecture).
They keep the original version intact, and add a new version that only the session using it can see, when committed the older version is disabled, and the newer version is enabled for everybody(the file piles-up with data and needs regular cleanup).
Oracle overwrites the data itself, and uses a rollback segments/undo tablespaces for other sessions and to rollback.

XtremeData dbX supports MVCC.
In addition, dbX can make use of SQL primitives implemented in FPGA hardware.

SAP HANA also uses MVCC.
SAP HANA is a full In-Memory Computing System, so MVCC costs for select is very low... :)

Here is a link to the PostgreSQL doc page on MVCC. The choice quote (emphasis mine):
The main advantage to using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading.
This is why Jeff was so confounded by his deadlocks. A read should never be able to cause them.

SQL Server 2005 and up offer MVCC as an option; it isn't the default, however. MS calls it snapshot isolation, if memory serves.

MVCC can also be implemented manually, by adding a version number column to your tables, and always doing inserts instead of updates.
The cost of this is a much larger database, and slower selects since each one needs a subquery to find the latest record.
It's an excellent solution for systems that require 100% auditing for all changes.

MySQL also uses MVCC by default if you use InnoDB tables:
http://dev.mysql.com/doc/refman/5.0/en/innodb-multi-versioning.html

McObject announced in 11/09 that it has added an optional MVCC transaction manager to its eXtremeDB embedded database:
http://www.mcobject.com/november9/2009
eXtremeDB, originally developed as an in-memory database system (IMDS), is now available in editions with hybrid (in-memory/on-disk) storage, High Availability, 64-bit support and more.

There's a good explanation of MVCC -- with diagrams -- and some performance numbers for eXtremeDB in this article, written by McObject's co-founder and CEO, in RTC Magazine:
http://www.rtcmagazine.com/articles/view/101612
Clearly MVCC is increasingly beneficial as an application scales to include many tasks executing on multiple CPU cores.

DB2 version 9.7 has a licensed version of postgress plus in it. This means that this feature (in the right mode) supports this feature.

Berkeley DB also supports MVCC.
And when BDB storage engine is used in MySQL, MySQL also supports MVCC.
Berkeley DB is a very powerful, customizable fully ACID conform DBMS. It supports several different methods for indexing, master-slave replication, can be used as a pure key value store with it's own dynamic API or queried with SQL if wanted. Worth taking a look at.
Another document oriented DBMS embracing MVCC would be CouchDB. MVCC here also is a big plus for the built in peer-to-peer replication.

From http://vschart.com/list/multiversion-concurrency-control/
Couchbase,
OrientDB,
CouchDB,
PostgreSQL,
Project Voldemort,
BigTable,
Percona Server,
HyperGraphDB,
Drizzle,
Cloudant,
IBM DB2,
InterSystems Caché,
InterBase

Related

Should I use In memory SQL (Hekaton) for queue messaging system?

I work with a platform that have a messaging system that uses SQL server tables as queues.
That system was based on this: Using Tables as queues
ATM we are facing some scalability issues, since this distributive schema is mainly based on SQL Locks and disk operations in order to guarantee the durability/coherence of data.
In order to solve the disk based I/O bottleneck and to improve the bad distributive logic, we are thinking in changing disk based SQL tables to In memory SQL (Hekaton) available at SQL 2014 & 2016.
I've read some stuff about Hekaton already, but I'm still not sure if this is the best approach, or if is possible to implement those queues into In memory and if this is the best approach.
Most of those queues are implementing pessimistic concurrency, and Hekaton uses no locks system only optimistic concurrency (based on multi-versioning). Is it "always" (I know this is a bad word) possible to change the pessimistic concurrency into an optimistic one? For example on the above queues.
Is Hekaton made for many inserts/deletes (enqueue/dequeue), order rows (FIFO queues), and a lot of variations of table sizes (workload variations on the server will increase/decrease the queues size)? It will be possible to update properly the Statistics for the query performance of native store procedures?
I feel like native compiled SQL store procedures will improve a lot the performance, but I'm not sure if this kind of implementation (correlated FIFO queues) are good to be used on Hekaton, since I'm not finding any examples of "In memory queues" implementations using Hekaton.
You can implement in Hekaton what you described - as you mentioned, the app will have to take care of retrying in case of the transaction being aborted due to concurrency on the same row. Having said that, you also have to consider that SQL 2014 does not support large binary objects, you will need to use SQL 2016 or workaround it as we do it for ASP.NET Session state:
http://blogs.msdn.com/b/kenkilty/archive/2014/07/03/asp-net-session-state-using-sql-sever-in-memory.aspx
Hekaton is designed for OLTP, that means, lots of inserts, updates, deletes.
Plan ahead the memory requirements:
https://msdn.microsoft.com/en-us/library/dn133186.aspx

What data storage systems allow queries about the state of the system in the past?

I'm designing a system where questions like "what was the sum of all records matching certain criteria, at a certain time" are important.
Multi-value concurrency control (MVCC) seems like the way to go here, since it keeps an audit trail forever.
However, it would be nice if the data storage could handle this for me, rather than cobbling it together out of other database features. Now, Oracle and CouchDB other database engines use MVCC, but only behind the scenes. They use it to resolve conflicts or to decide what to do when a long-running query encounters recently updated data. But I don't know of any systems that allow you to explicitly say "in the system state of 17:00 January 20 2009, what does this query return". So are there such systems out there? Ideally, open source?
Take a look at flashback queries with Oracle.
As far as I know, MVCC doesn't guarantee the duration of earlier versions of rows. Its target is concurrency control for transactions; when all transactions for row 'x' have been committed, there's no real need for MVCC to keep earlier versions of that row.
You're thinking of Temporal Databases. Snodgrass's old book Developing Time-Oriented Database Applications in SQL is worth skimming, especially now that it's available as a PDF. If I had to do a temporal database from scratch again, I'd also read Temporal Data & the Relational Model, and anything else dealing with 6NF. (Where '6NF' <> 'DKNF'.)

Has open source ever created a single file database that auto handles transactions?

Has open source ever created a single file database that has better performance when handling large sets of sql queries that aren't delivered in formal SQL transaction sets? I work with a .NET server that does some heavy replication of thousands of rows of data from another server and it does so it a 1-by-1 fashion without formal SQL transactions. So, therefore I cannot use SQLite or FirebirdDB or JavaDB because they all don't automatically batch the transactions and therefore the performance is dismal. Each insert waits for the success of the previous one, etc. So, I am forced to use a heavier database like SQLServer, MySQL, Postgres, or Oracle.
Does anyone know of a flat file database (that has a JDBC connect driver) that would support auto batching transactions and solve my problem?
The main think I dont like about the heavier databases is the lack of the ability to see inside the database with a one-mouse-click operation, like you can with SQLLite.
I tried creating a SQLite database and
then set PRAGMA read_uncommitted=TRUE;
and it didn't result in any
performance improvement.
I think that Firebird can work for this.
Firebird have good dotnet provider and many solution for replication
May be you can read this article for Firebird transaction
Try hypersonic DB - http://hsqldb.org/doc/guide/ch02.html#N104FC
If you want your transactions to be durable (i.e. survive a power failure) then the database will HAVE to write to the disc after each transaction (this is usually a log of some sort).
If your transactions are very small this will result in a huge number of writes, and very poor performance even on your battery backed raid controller or SSD, but worse performance on consumer-grade hardware.
The only way of avoiding this is to somehow disable the flush at txn commit (which of course breaks durability). I have no idea which ones support this, but it should be easy to find out.

realtime system database use

Given a .NET environment with Windows CE, can you persist thousands of records per second in a local database (SQL Server 2008 - standard or CE).
What are the performance issues with persisting realtime instrument data in a database versus a log file?
SQL Server 2008 standard is more than capable of those insertion rates PROVIDED you have hardware capable of supporting it.
The question you really need to be asking is do I require the ability to search the captured data quickly?
This SO answer might be of interest: What does database query and insert speed depend on?
The number (and width) of indexes on a table will obviously have an impact on insertion rate.
If you are considering open-source, then MySQL is often cited as being able to handle high volumes.

SQL Server and Oracle, which one is better in terms of scalability? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
MS SQL Server and Oracle, which one is better in terms of scalability?
For example, if the data size reach 500 TB etc.
Both Oracle and SQL Server are shared-disk databases so they are constrained by disk bandwidth for queries that table scan over large volumes of data. Products such as Teradata, Netezza or DB/2 Parallel Edition are 'shared nothing' architectures where the database stores horizontal partitions on the individual nodes. This type of architecture gives the best parallel query performance as the local disks on each node are not constrained through a central bottleneck on a SAN.
Shared disk systems (such as Oracle Real Application Clusters or Clustered SQL Server installations still require a shared SAN, which has constrained bandwidth for streaming. On a VLDB this can seriously restrict the table-scanning performance that is possible to achieve. Most data warehouse queries run table or range scans across large blocks of data. If the query will hit more than a few percent of rows a single table scan is often the optimal query plan.
Multiple local direct-attach disk arrays on nodes gives more disk bandwidth.
Having said that I am aware of an Oracle DW shop (a major european telco) that has an oracle based data warehouse that loads 600 GB per day, so the shared disk architecture does not appear to impose unsurmountable limitations.
Between MS-SQL and Oracle there are some differences. IMHO Oracle has better VLDB support than SQL server for the following reasons:
Oracle has native support for bitmap indexes, which are an index structure suitable for high speed data warehouse queries. They essentially do a CPU for I/O tradeoff as they are run-length encoded and use relatively little space. On the other hand, Microsoft claim that Index Intersection is not appreciably slower.
Oracle has better table partitioning facilities than SQL Server. IIRC The table partitioning in SQL Server 2005 can only be done on a single column.
Oracle can be run on somewhat larger hardware than SQL Server, although one can run SQL server on some quite respectably large systems.
Oracle has more mature support for Materialized views and Query rewrite to optimise relational queries. SQL2005 does have some query rewrite capability but it is poorly documented and I haven't seen it used in a production system. However, Microsoft will suggest that you use Analysis Services, which does actually support shared nothing configurations.
Unless you have truly biblical data volumes and are choosing between Oracle and a shared nothing architecture such as Teradata you will probably see little practical difference between Oracle and SQL Server. Particularly since the introduction of SQL2005 the partitioning facilities in SQL Server are viewed as good enough and there are plenty of examples of multi-terabyte systems that have been successfully implemented on it.
When you are talking 500TB, that is (a) big and (b) specialized.
I'd be going to a consultancy firm with appropriate specialists to look at the existing skill sets, integration with existing technology stacks, expected usage, backup/recovery/DR requirements....
In short, it's not the sort of project I'd be heading into based on opinions from stackoverflow. No offence intended, but there's simply too many factors to take into account, a lot of which would be business confidential.
Whether Oracle or MSSQL will scale / perform better is question #15. The data model is the first make-it or break-it item regardless of if you're running Oracle, MSSQL, Informix or anything else. Data model structure, what kind of applicaiton, how it accesses the db etc, which platform your developers know well enough to target for a large system etc are the first questions you should ask yourself.
I've worked as a DBA on Oracle (although some years back) and I use MSSQL extensively now, although not as a formal DBA. My advice would be that in the vast majority of cases both will meet everything you can throw at them and your performance issues will be much more dependent upon database design and deployment than the underlying characteristics of the products, which in both cases are absolutely and utterly solid (MSSQL is the best product that MS makes in many peoples opinion so don't let the usual perception of MS blind you on that).
Myself I would tend towards MSSQL unless your system is going to be very large and truly enterprise level (massive numbers of users, multiple 9's uptime etc.) simply because in my experience Oracle tends to require a higher level of DBA knowledge and maintenance than MSSQL to get the best out of it. Oracle also tends to be more expensive, both for initial deployment and in the cost to hire DBAs for it. OTOH if you are looking at an enterprise system then Oracle would have the edge, not least because if you can afford it their support is second to none.
I have to agree with those who said deisgn was more important.
I've worked with superfast and super slow databases of many different flavors (the absolute worst being an Oracle database, but it wasn't Oracle's fault). Design of the database and how you decide to index it and partition it and query it have far more to do with the scalability than whether the product is from MSSQL Server or Oracle.
I think you may more easily find more Oracle dbas with terrabyte database experience (running a large database is a specialty just like knowing a particular flavor of SQL) but that could depend on your local area.
oracle people will tell you oracle is better, sql server peopele will tell you sql server is better.
i say they scale pretty much the same. use what you know better. you have databases out there that are that size on oracle as well as sql server
When you get to OBSCENE database sizes (where over 1TB is really big enough, and 500TB is frigging massive), then operational support must come very high up on the list of requirements. With that much data, you don't mess about with penny pinching system specifications.
How are you going to backup that size of system? Upgrade the OS and patch the database? Scalability and reliability a concern?
I have experience of both Oracle and MS SQL, and for the really really big systems (users, data or importance) then Oracle is better designed for operational support and data management.
Every tried to backup and restore a 1TB+ SQL Server database split over multiple databases on multiple instances with transaction log files being spat out everywhere by each database and trying to keep it all in sync? Good luck with that.
With Oracle, you have ONE database (so I disagree with the "shared nothing" approach is better) with ONE set of REDO logs(1) and one set of archive logs(2) and you can just add extra hardware nodes without changing (i.e. repartitioning) you application and data.
(1) Redo logs are, of course, mirrored.
(2) Archive logs are, of course, stored in multiple locations.
It would also depend on what is your application meant for. If it uses only Inserts with very few updates, then I think MSSQL would be more scalable and better in terms of performance. However if one has lots of updates, then Oracle would scaleup better
I very much doubt that you are going to get an objective answer to that particular question, until you come across anyone that has implemented the same database (schema, data, etc.) on both platforms.
However given the fact that you can find millions of happy users of both databases, I dare say it's not too much of a stretch to say either will scale just fine (I've seen a snappy Sql 2005 implementation of 300 TB that seemed pretty responsive)
Oracle like a high-quality manual film camera, which needs the best photographer to take the best picture while MS SQL like an automatic digital camera. In old days, of course, all professional photographers will use film camera, now think about how many professional photographers use automatic digital camera.

Resources