I'm pretty sure that with a relational database, it's faster and better to read 50 records at once than to make 50 calls for one record each. Is there a performance benefit from performing multiple writes all at once? If not, why not?
Probably depends on the RDBMS and the storage engine, but at least in MySQL/InnoDB, multiple writes in one transaction (as well the multi-insert syntax, which, afaik, is MySQL extension) allows you not to update non-unique indexes before transaction is commited, and the update of the index happens at once with all new values (since it's a b-tree, in this way its much faster). It's possible that RDBMS optimizes other writes as well, to have sequential instead of random writes.
Also, if there is a table-level locking (as in MyISAM), locking the table once, writting multiple records and then unlocking removes the overhead of lock/unlock for every write.
So generally, there is performance gain, but it depends on the database server used.
Doing all your reads at once makes sense, although there are some problems in it which I'll touch on in a minute.
Doing all your writes at once poses a particular problem: the data is in the database until you put it there. If you're waiting for some optimization threshold (let's say 50) then transaction 1 is going to have to wait for (unrelated) transactions 2-50 to complete before it goes to the database. This means that in the mean time (which could be several [seconds, minutes, hours]) nobody knows what those records are, or if they're updated what the new values are. (Same with reads but the other way around. Your data may be out of date by the time you get to use it.)
Performance wise, I cannot imagine that combining writes closer together would not have some performance. (IF that was confusing to read, I meant "You should always get a performance boost by grouping.") If nothing else, you have a better chance to hit memory caches instead of disk caches than if you do them separately. #Darhazer brings up a good point about locking. So strictly from a total-time-spent-writing point of view, it would be better to group them. From an application performance point of view, it's difficult to say without an intricate knowledge of the business requirements of the app.
Related
Let's say theoretically, I have database with an absurd number of tables (100,000+). Would that lead to any sort of performance issues? Provided most queries (99%+) will only run on 2-3 tables at a time.
Therefore, my question is this:
What operations are O(n) on the number of tables in PostgreSQL?
Please note, no answers about how this is bad design, or how I need to plan out more about what I am designing. Just assume that for my situation, having a huge number of tables is the best design.
pg_dump and pg_restore and pg_upgrade are actually worse than that, being O(N^2). That used to be a huge problem, although in recent versions, the constant on that N^2 has been reduced to so low that for 100,000 table it is probably not enough to be your biggest problem. However, there are worse cases, like dumping tables can be O(M^2) (maybe M^3, I don't recall the exact details anymore) for each table, where M is the number of columns in the table. This only applies when the columns have check constraints or defaults or other additional info beyond a name and type. All of these problems are particularly nasty when you have no operational problems to warn you, but then suddenly discover you can't upgrade within a reasonable time frame.
Some physical backup methods, like barman using rsync, are also O(N^2) in the number of files, which is at least as great as the number of tables.
During normal operations, the stats collector can be a big bottleneck. Everytime someone requests updated stats on some table, it has to write out a file covering all tables in that database. Writing this out is O(N) for the tables in that database. (It used to be worse, writing out one file for the while instance, not just the database). This can be made even worse on some filesystems, which when renaming one file over the top of an existing one, implicitly fsyncs the file, so putting it on a RAM disc can at least ameliorate that.
The autovacuum workers loop over every table (roughly once per autovacuum_naptime) to decide if they need to be vacuumed, so a huge number of tables can slow this down. This can also be worse than O(N), because for each table there is some possibility it will request updated stats on it. Worse, it could block all concurrent autovacuum workers while doing so (this last part fixed in a backpatch for all supported versions).
Another problem you might into is that each database backend maintains a cache of metadata on each table (or other object) it has accessed during its lifetime. There is no mechanism for expiring this cache, so if each connection touches a huge number of tables it will start consuming a lot of memory, and one copy for each backend as it is not shared. If you have a connection pooler which hold connections open indefinitely, this can really add up as each connection lives long enough to touch many tables.
pg_dump with some options, probably -s. Some other options make it depend more on size of data.
The project requires storing binary data into PostgreSQL (project requirement) database. For that purpose we made a table with following columns:
id : integer, primary key, generated by client
data : bytea, for storing client binary data
The client is a C++ program, running on Linux.
The rows must be inserted (initialized with a chunk of binary data), and after that updated (concatenating additional binary data to data field).
Simple tests have shown that this yields better performance.
Depending on your inputs, we will make client use concurrent threads to insert / update data (with different DB connections), or a single thread with only one DB connection.
We haven't much experience with PostgreSQL, so could you help us with some pointers concerning possible bottlenecks, and whether using multiple threads to insert data is better than using a single thread.
Thank you :)
Edit 1:
More detailed information:
there will be only one client accessing the database, using only one Linux process
database and client are on the same high performance server, but this must not matter, client must be fast no matter the machine, without additional client configuration
we will get new stream of data every 10 seconds, stream will provide new 16000 bytes per 0.5 seconds (CBR, but we can use buffering and only do inserts every 4 seconds max)
stream will last anywhere between 10 seconds and 5 minutes
It makes extremely little sense that you should get better performance inserting a row then appending to it if you are using bytea.
PostgreSQL's MVCC design means that an UPDATE is logically equivalent to a DELETE and an INSERT. When you insert the row then update it, what's happening is that the original tuple you inserted is marked as deleted and new tuple is written that contains the concatentation of the old and added data.
I question your testing methodology - can you explain in more detail how you determined that insert-then-append was faster? It makes no sense.
Beyond that, I think this question is too broad as written to really say much of use. You've given no details or numbers; no estimates of binary data size, rowcount estimates, client count estimates, etc.
bytea insert performance is no different to any other insert performance tuning in PostgreSQL. All the same advice applies: Batch work into transactions, use multiple concurrent sessions (but not too many; rule of thumb is number_of_cpus + number_of_hard_drives) to insert data, avoid having transactions use each others' data so you don't need UPDATE locks, use async commit and/or a commit_delay if you don't have a disk subsystem with a safe write-back cache like a battery-backed RAID controller, etc.
Given the updated stats you provided in the main comments thread, the amount of data you want to consume sounds entirely practical with appropriate hardware and application design. Your peak load might be achievable even on a plain hard drive if you had to commit every block that came in, since it'd require about 60 transactions per second. You could use a commit_delay to achieve group commit and significantly lower fsync() overhead, or even use synchronous_commit = off if you can afford to lose a time window of transactions in case of a crash.
With a write-back caching storage device like a battery-backed cache RAID controller or an SSD with reliable power-loss-safe cache, this load should be easy to cope with.
I haven't benchmarked different scenarios for this, so I can only speak in general terms. If designing this myself, I'd be concerned about checkpoint stalls with PostgreSQL, and would want to make sure I could buffer a bit of data. It sounds like you can so you should be OK.
Here's the first approach I'd test, benchmark and load-test, as it's in my view probably the most practical:
One connection per data stream, synchronous_commit = off + a commit_delay.
INSERT each 16kb record as it comes in into a staging table (if possible UNLOGGED or TEMPORARY if you can afford to lose incomplete records) and let Pg synchronize and group up commits. When each stream ends, read the byte arrays, concatenate them, and write the record to the final table.
For absolutely best speed with this approach, implement a bytea_agg aggregate function for bytea as an extension module (and submit it to PostgreSQL for inclusion in future versions). In reality it's likely you can get away with doing the bytea concatenation in your application by reading the data out, or with the rather inefficient and nonlinearly scaling:
CREATE AGGREGATE bytea_agg(bytea) (SFUNC=byteacat,STYPE=bytea);
INSERT INTO final_table SELECT stream_id, bytea_agg(data_block) FROM temp_stream_table;
You would want to be sure to tune your checkpointing behaviour, and if you were using an ordinary or UNLOGGED table rather than a TEMPORARY table to accumulate those 16kb records, you'd need to make sure it was being quite aggressively VACUUMed.
See also:
Whats the fastest way to do a bulk insert into Postgres?
How to speed up insertion performance in PostgreSQL
After posting the question here, I got to know that NoSQL are better at scaling out because they make a trade off between support for transaction and scalability.
So I wonder in what circumstances transactions are not that important so that scalability is more preferable to support of transaction?
Well, I would say first that NoSQL is better at scaling is some situations, but not all.
Full ACID transactions are Atomic, Consistent, Isolated and Durable. If you lose transactions, you will loose some or all of ACID within the datastore.
There are many ways to restore these functions with other asynchronous systems like message queues that themselves are durable. You can shove data onto a durable message queue, pop the data and deal with it in your NoSQL, then, when you can confirm it's stored to your required minimum, you can flag the message as consumed. It's the D in ACID, but distributed and asynchronous. There are ways to ensure the others, but they are often sacrificed to some extent, or moved into another place in the system. With some NoSQL solutions, you just have to move consistency into the application so it doesn't try to store invalid data.
When you start moving away from database driven transactions, you must increase your application testing dramatically to ensure your system doesn't fail (for some values of fail).
There are essentially no situations where transactions and constraints are not important in a system that has both read and write requirements. If they weren't you wouldn't care about your data at all (and some people don't, but regret it later). There are however levels of "caring". It's just a matter of how you end up at ACID or some pseudo-ACID that's "good enough". RDMBS makes caring about your data cheap. NoSQL makes caring about your data expensive, but, it makes scaling cheap(er) (in some cases). There are many companies with multi-terabyte database in RDBMSes, so to say unilaterally that "they don't scale" is simply inaccurate. Multi-terabyte SQL databases however, can cost lots of money, depending on the use case (you can after all just slap a RAID 10 array with a few 3TB drives onto a computer and throw a database engine on it. Might take several minutes to a few hours to do any kind of table scan on a big table, or even indexed look-up though, but if you don't care, it's cheap and multi-terabyte).
The biggest category is read-only type queries, where an aborted or botched transaction can simply be repeated. Anything where you are changing an underlying state, or want to guarantee once and only once activity, should have proper transactional semantics.
That is, "I want to order one widget, charge my credit card" should be a proper transaction: I don't want my card charged unless the widget is ordered, and the vendor doesn't want the widget sent unless the card is charged. "Report the shipment status of order xyz" doesn't need to be transactional -- if I don't get an answer, I can hit reload.
Much of it is just a bit of lateral thinking.
Thw whole point of transcation is you wrap up several operations, and should any fail all that have succeeded get rolled back, and while the transaction is in progress, records are locked and unless you have read uncommitted going, you don't see any on the individual changes of state until the transaction is committed.
Doing all that with distributed systems is expensive, because you need one 'central' and difficult to scale point that needs to 'know' all about the others.
So instead or Order this, charge my card, and show me my current balance.
You do Try to order this, if it's instock charge my card, and if my card gets charged the current known balance will be this.
There's a risk, that the order will be placed, put payment fail, so you need to deal with that. There's a risk that the proposed balance of the card my not be entirely accurate, hence add weasel words and show the potential effect of payment as opposed to the result.
It's not so much are transactions important, it's seeing as they aren't as well supported in NoSQL systems, where/how can I get away with not using them.
In plain English, what are the disadvantages and advantages of using
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
in a query for .NET applications and reporting services applications?
This isolation level allows dirty reads. One transaction may see uncommitted changes made by some other transaction.
To maintain the highest level of isolation, a DBMS usually acquires locks on data, which may result in a loss of concurrency and a high locking overhead. This isolation level relaxes this property.
You may want to check out the Wikipedia article on READ UNCOMMITTED for a few examples and further reading.
You may also be interested in checking out Jeff Atwood's blog article on how he and his team tackled a deadlock issue in the early days of Stack Overflow. According to Jeff:
But is nolock dangerous? Could you end
up reading invalid data with read uncommitted on? Yes, in theory. You'll
find no shortage of database
architecture astronauts who start
dropping ACID science on you and all
but pull the building fire alarm when
you tell them you want to try nolock.
It's true: the theory is scary. But
here's what I think: "In theory there
is no difference between theory and
practice. In practice there is."
I would never recommend using nolock
as a general "good for what ails you"
snake oil fix for any database
deadlocking problems you may have. You
should try to diagnose the source of
the problem first.
But in practice adding nolock to queries that you absolutely know are simple, straightforward read-only affairs never seems to lead to problems... As long as you know what you're doing.
One alternative to the READ UNCOMMITTED level that you may want to consider is the READ COMMITTED SNAPSHOT. Quoting Jeff again:
Snapshots rely on an entirely new data change tracking method ... more than just a slight logical change, it requires the server to handle the data physically differently. Once this new data change tracking method is enabled, it creates a copy, or snapshot of every data change. By reading these snapshots rather than live data at times of contention, Shared Locks are no longer needed on reads, and overall database performance may increase.
My favorite use case for read uncommited is to debug something that is happening inside a transaction.
Start your software under a debugger, while you are stepping through the lines of code, it opens a transaction and modifies your database. While the code is stopped, you can open a query analyzer, set on the read uncommited isolation level and make queries to see what is going on.
You also can use it to see if long running procedures are stuck or correctly updating your database using a query with count(*).
It is great if your company loves to make overly complex stored procedures.
This can be useful to see the progress of long insert queries, make any rough estimates (like COUNT(*) or rough SUM(*)) etc.
In other words, the results the dirty read queries return are fine as long as you treat them as estimates and don't make any critical decisions based upon them.
The advantage is that it can be faster in some situations. The disadvantage is the result can be wrong (data which hasn't been committed yet could be returned) and there is no guarantee that the result is repeatable.
If you care about accuracy, don't use this.
More information is on MSDN:
Implements dirty read, or isolation level 0 locking, which means that no shared locks are issued and no exclusive locks are honored. When this option is set, it is possible to read uncommitted or dirty data; values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction. This option has the same effect as setting NOLOCK on all tables in all SELECT statements in a transaction. This is the least restrictive of the four isolation levels.
When is it ok to use READ UNCOMMITTED?
Rule of thumb
Good: Big aggregate reports showing constantly changing totals.
Risky: Nearly everything else.
The good news is that the majority of read-only reports fall in that Good category.
More detail...
Ok to use it:
Nearly all user-facing aggregate reports for current, non-static data e.g. Year to date sales.
It risks a margin of error (maybe < 0.1%) which is much lower than other uncertainty factors such as inputting error or just the randomness of when exactly data gets recorded minute to minute.
That covers probably the majority of what an Business Intelligence department would do in, say, SSRS. The exception of course, is anything with $ signs in front of it. Many people account for money with much more zeal than applied to the related core metrics required to service the customer and generate that money. (I blame accountants).
When risky
Any report that goes down to the detail level. If that detail is required it usually implies that every row will be relevant to a decision. In fact, if you can't pull a small subset without blocking it might be for the good reason that it's being currently edited.
Historical data. It rarely makes a practical difference but whereas users understand constantly changing data can't be perfect, they don't feel the same about static data. Dirty reads won't hurt here but double reads can occasionally be. Seeing as you shouldn't have blocks on static data anyway, why risk it?
Nearly anything that feeds an application which also has write capabilities.
When even the OK scenario is not OK.
Are any applications or update processes making use of big single transactions? Ones which remove then re-insert a lot of records you're reporting on? In that case you really can't use NOLOCK on those tables for anything.
Use READ_UNCOMMITTED in situation where source is highly unlikely to change.
When reading historical data. e.g some deployment logs that happened two days ago.
When reading metadata again. e.g. metadata based application.
Don't use READ_UNCOMMITTED when you know souce may change during fetch operation.
Regarding reporting, we use it on all of our reporting queries to prevent a query from bogging down databases. We can do that because we're pulling historical data, not up-to-the-microsecond data.
This will give you dirty reads, and show you transactions that's not committed yet. That is the most obvious answer. I don't think its a good idea to use this just to speed up your reads. There is other ways of doing that if you use a good database design.
Its also interesting to note whats not happening. READ UNCOMMITTED does not only ignore other table locks. It's also not causing any locks in its own.
Consider you are generating a large report, or you are migrating data out of your database using a large and possibly complex SELECT statement. This will cause a shared lock that's may be escalated to a shared table lock for the duration of your transaction. Other transactions may read from the table, but updates are impossible. This may be a bad idea if its a production database since the production may stop completely.
If you are using READ UNCOMMITTED you will not set a shared lock on the table. You may get the result from some new transactions or you may not depending where it the table the data were inserted and how long your SELECT transaction have read. You may also get the same data twice if for example a page split occurs (the data will be copied to another location in the data file).
So, if its very important for you that data can be inserted while doing your SELECT, READ UNCOMMITTED may make sense. You have to consider that your report may contain some errors, but if its based on millions of rows and only a few of them are updated while selecting the result this may be "good enough". Your transaction may also fail all together since the uniqueness of a row may not be guaranteed.
A better way altogether may be to use SNAPSHOT ISOLATION LEVEL but your applications may need some adjustments to use this. One example of this is if your application takes an exclusive lock on a row to prevent others from reading it and go into edit mode in the UI. SNAPSHOT ISOLATION LEVEL does also come with a considerable performance penalty (especially on disk). But you may overcome that by throwing hardware on the problem. :)
You may also consider restoring a backup of the database to use for reporting or loading data into a data warehouse.
It can be used for a simple table, for example in an insert-only audit table, where there is no update to existing row, and no fk to other table. The insert is a simple insert, which has no or little chance of rollback.
I always use READ UNCOMMITTED now. It's fast with the least issues. When using other isolations you will almost always come across some Blocking issues.
As long as you use Auto Increment fields and pay a little more attention to inserts then your fine, and you can say goodbye to blocking issues.
You can make errors with READ UNCOMMITED but to be honest, it is very easy make sure your inserts are full proof. Inserts/Updates which use the results from a select are only thing you need to watch out for. (Use READ COMMITTED here, or ensure that dirty reads aren't going to cause a problem)
So go the Dirty Reads (Specially for big reports), your software will run smoother...
I'm creating an app that will have to put at max 32 GB of data into my database. I am using B-tree indexing because the reads will have range queries (like from 0 < time < 1hr).
At the beginning (database size = 0GB), I will get 60 and 70 writes per millisecond. After say 5GB, the three databases I've tested (H2, berkeley DB, Sybase SQL Anywhere) have REALLY slowed down to like under 5 writes per millisecond.
Questions:
Is this typical?
Would I still see this scalability issue if I REMOVED indexing?
What are the causes of this problem?
Notes:
Each record consists of a few ints
Yes; indexing improves fetch times at the cost of insert times. Your numbers sound reasonable - without knowing more.
You can benchmark it. You'll need to have a reasonable amount of data stored. Consider whether or not to index based upon the queries - heavy fetch and light insert? index everywhere a where clause might use it. Light fetch, heavy inserts? Probably avoid indexes. Mixed workload; benchmark it!
When benchmarking, you want as real or realistic data as possible, both in volume and on data domain (distribution of data, not just all "henry smith" but all manner of names, for example).
It is typical for indexes to sacrifice insert speed for access speed. You can find that out from a database table (and I've seen these in the wild) that indexes every single column. There's nothing inherently wrong with that if the number of updates is small compared to the number of queries.
However, given that:
1/ You seem to be concerned that your writes slow down to 5/ms (that's still 5000/second),
2/ You're only writing a few integers per record; and
3/ You're queries are only based on time queries,
you may want to consider bypassing a regular database and rolling your own sort-of-database (my thoughts are that you're collecting real-time data such as device readings).
If you're only ever writing sequentially-timed data, you can just use a flat file and periodically write the 'index' information separately (say at the start of every minute).
This will greatly speed up your writes but still allow a relatively efficient read process - worst case is you'll have to find the start of the relevant period and do a scan from there.
This of course depends on my assumption of your storage being correct:
1/ You're writing records sequentially based on time.
2/ You only need to query on time ranges.
Yes, indexes will generally slow inserts down, while significantly speeding up selects (queries).
Do keep in mind that not all inserts into a B-tree are equal. It's a tree; if all you do is insert into it, it has to keep growing. The data structure allows for some padding, but if you keep inserting into it numbers that are growing sequentially, it has to keep adding new pages and/or shuffle things around to stay balanced. Make sure that your tests are inserting numbers that are well distributed (assuming that's how they will come in real life), and see if you can do anything to tell the B-tree how many items to expect from the beginning.
Totally agree with #Richard-t - it is quite common in offline/batch scenarios to remove indexes completely before bulk updates to a corpus, only to reapply them when update is complete.
The type of indices applied also influence insertion performance - for example with SQL Server clustered index update I/O is used for data distribution as well as index update, where as nonclustered indexes are updated in seperate (and therefore more expensive) I/O operations.
As with any engineering project - best advice is to measure with real datasets (skews page distribution, tearing etc.)
I think somewhere in the BDB docs they mention that page size greatly affects this behavior in btree's. Assuming you arent doing much in the way of concurrency and you have fixed record sizes, you should try increasing your page size