Autovacuum postgresql 11 with big tables and many updates

Autovacuum postgresql 11 with big tables and many updates - database

I have a problem with my database Postgresql 11.
I have database with many rows 10+ millions.
I analyzed of my the biggest tables (open picture):[left side was one day, right side is next day (24 hour)]
- 60+ million updates and 30+ million updates everyday
I'm using autovacuum:
autovacuum = on
vacuum_cost_delay = 0
vacuum_cost_page_hit = 0
vacuum_cost_page_miss = 1
vacuum_cost_page_dirty = 1
vacuum_cost_limit = 10000
autovacuum_max_workers = 3
autovacuum_naptime = 1s
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50
autovacuum_vacuum_scale_factor = 0
autovacuum_analyze_scale_factor = 0.00
autovacuum_vacuum_cost_delay = 5ms
autovacuum_vacuum_cost_limit = -1
I have monitoryng of dead tuples:
Its near 600+k tuples every time with vacuum config.
Problem: The base grows daily by 7-10 gigabytes! After Vacuum Full base size is 69GB but after one week it take 110-115 GB. So, how can I change the config so that the database stops growing?
I cant up vacuum workers and autovacuum_cost_limit is max value.
I would like to take some experience from someone because i'm just junior DevOps)

Reset all these parameters back to the default setting, and change the cost delay for the table in question:
ALTER TABLE large SET (autovacuum_vacuum_cost_delay = 0);
This makes autovacuum on that table as fast as possible.
In addition, set maintenance_work_mem high for best autovacuum performance.

What kind of indexes are these? Assuming the most common (btree):
Vacuuming indexes does not shrink them, it only frees up space for internal reuse. And that space reuse is tightly constrained. Space can only be reused if the new tuple naturally belongs on that page (given the value of the other remaining tuples already there) or if the page is entirely empty then the whole page can be marked available to be linked into some other place in the tree.
So for example if you have an user table which is never deleted from and has an index on/starting with a timestamptz column "last_logged_on", then as your users drop out they will never get that column updated anymore, while all their former (still active) neighbors will have moved away to other index pages. Eventually you will build up long strings of index pages marking the last time former customers logged in, with only one or a few customers per index page. (Any index page containing zero former customers should have been delinked and reused already).
As another example, if you have a lot of transient customers who eventually get deleted, and only small fraction stick around long term, then you can get the same thing with the index on their customer_id if it is populated by a sequence. Each long term customer could end up on an index page by themselves, as the transients end up getting deleted from around them.
These problems generally develop quite slowly and can be fixed easily with a rare REINDEX (which can now (>= v12) be done CONCURRENTLY, meaning without locking out ordinary INSERT/UPDATE/DELETE from the table).

Related

How does Scylla Evict Data from its Cache?

How does Scylla determine when to evict data from its cache? For example, suppose table T has the following structure:
K1 C1 V1 V2 V3
I populate the above table with 500 rows (e.g, the query SELECT * from T WHERE K1 = X & C1 = Y returns 500 rows).
Some time later I insert a new row into the above table that would cause the above query to return 501 rows, instead of 500 rows.
Does Scylla know to automatically evict the 500 rows from its cache or at least to add row 501 to its cache? If not, most queries will quickly start returning outdated data. Similarly, what happens if I don’t add a new row to the database, rather I update one of the existing 500 rows. Is Scylla aware of this modification and capable of updating its cache automatically? If yes, is it smart enough only to update the data that changed (the new row or the row that was modified) or does it evict/update all 500 rows?
Are there any cases to be aware of where data is updated in SSTables but not in memory?
Thanks
P.S
I read a lot about how caching works in Scylla but I didn’t see a clear answer to the above question. If Scylla is indeed aware of background updates I would also be curious to learn HOW it achieves such dynamic and intelligent updating of its cache.

I think you are misunderstanding what the cache does in Scylla, or any database for that matter.
The row cache, as its name suggests, caches (i.e., keeps in memory) individual rows - not the results of entire requests. So the fact that a request at one point returned 500 rows does not mean that the next time this request will come Scylla will return the same 500 rows. Not at all. Let me try to explain what does happen, although this is also documented elsewhere and I'll also simplify some details to hopefully get the point across:
When a Scylla node boots up, all the data is located on disk (stored in files known as sstables) and nothing is in memory. When a user asks to read one specific row that is not already in the in-memory cache, this row is read from disk and then stored in the cache. If the user later reads the same row again, it is returned from cache immediately. If the user writes to this row, the row is updated in the cache as well as on disk (the details are slightly more complicated, there is also an in-memory table - memtable - but I'm trying to simplify). The cache is always up-to-date - if a row appears in it, it is correct. Of course it also may not appear in it.
The situation you describe in your question's text (although not the actual query you posted!) is about a scan of a slice of a partition, returning not one but many rows (500 or 501). Scylla needs to (and does) put in a bit more work to handle this case correctly:
When the scan of a certain range is done for the first time, Scylla reads those 500 rows in that range, and puts each of them in the row cache. But it also remembers that the cache is contiguous in that range - these 500 rows are everything that exists in this range. So when the user tries the same query again, the cache doesn't need to check if maybe there are additional rows between those 500 - it knows there aren't. If you later write a 501st row inside this range, this row is added to the cache, which knows it remained contiguous, so the next scan of this range will return 501 rows. Scylla does not need to evict the 500 rows just because one was added to the same partition.
If at some later point in time Scylla runs out of memory and needs to evict some rows from the cache, it may decide to evict all these 501 rows from the cache - or some of them. If it evicts some of them, it loses continuity - if it only remembers, say, 400 rows for the original range, if the user asks to scan that range again Scylla is forced (again, simplifying some details) to read all the rows in the range from disk, because it has no idea which specific rows it is missing in this range.

Postgres row_number() doubling table size roughly every 24 hours

I have an Assets table with ~165,000 rows in it. However, the Assets make up "Collections" and each Collection may have ~10,000 items, which I want to save a "rank" for so users can see where a given asset ranks within the collection.
The rank can change (based on an internal score), so it needs to be updated periodically (a few times an hour).
That's currently being done on a per-collection basis with this:
UPDATE assets a
SET rank = a2.seqnum
FROM
(SELECT a2.*,
row_number() OVER (
ORDER BY elo_rating DESC) AS seqnum
FROM assets a2
WHERE a2.collection_id = #{collection_id} ) a2
WHERE a2.id = a.id;
However, that's causing the size of the table to double (i.e. 1GB to 2GB) roughly every 24 hours.
A VACUUM FULL clears this up, but that doesn't feel like a real solution.
Can the query be adjusted to not create so much (what I assume is) temporary storage?
Running PostgreSQL 13.

Every update writes a new row version in Postgres. So (aside from TOASTed columns) updating every row in the table roughly doubles its size. That's what you observe. Dead tuples can later be cleaned up to shrink the physical size of the table - that's what VACUUM FULL does, expensively. See:
Are TOAST rows written for UPDATEs not changing the TOASTable column?
Alternatively, you might just not run VACUUM FULL and keep the table at ~ twice it's minimal physical size. If you run plain VACUUM (without FULL!) enough - and if you don't have long running transactions blocking that - Postgres will have marked dead tuples in the free-space map by the time the next UPDATE kicks in and can reuse the disk space, thus staying at ~ twice its minimal size. That's probably cheaper than shrinking and re-growing the table all the time, as the most expensive part is typically to physically grow the table. Be sure to have aggressive autovacuum settings for the table. See:
Aggressive Autovacuum on PostgreSQL
VACUUM returning disk space to operating system
Probably better yet, break out the ranking into a minimal separate 1:1 table (a.k.a. "vertical partitioning") , so that only minimal rows have to be written "a few times an hour". Probably including elo_rating you mention in the query, which seems to change at least as frequently (?).
(LEFT) JOIN to the main table in queries. While that adds considerable overhead, it may still be (substantially) cheaper. Depends on the complete picture, most importantly the average row size in table assets and the typical load apart from your costly updates.
See:
Many columns vs few tables - performance wise
UPDATE or INSERT & DELETE? Which is better for storage / performance with large text columns?

The best way to design a Reservation based table

One of my Clients has a reservation based system. Similar to air lines. Running on MS SQL 2005.
The way the previous company has designed it is to create an allocation as a set of rows.
Simple Example Being:
AllocationId | SeatNumber | IsSold
1234 | A01 | 0
1234 | A02 | 0
In the process of selling a seat the system will establish an update lock on the table.
We have a problem at the moment where the locking process is running slow and we are looking at ways to speed it up.
The table is already efficiently index, so we are looking at a hardware solution to speed up the process. The table is about 5 mil active rows and sits on a RAID 50 SAS array.
I am assuming hard disk seek time is going to be the limiting factor in speeding up update locks when you have 5mil rows and are updating 2-5 rows at a time (I could be wrong).
I've herd about people using index partition over several disk arrays, has anyone had similar experiences with trying to speed up locking? can anyone give me some advise onto a possible solution on what hardware might be able to be upgraded or what technology we can take advantage of in order to speed up the update locks (without moving to a cluster)?

One last try…
It is clear that there are too many locks hold for too long.
Once the system starts slowing down
due to too many locks there is no
point in starting more transactions.
Therefore you should benchmark the system to find out the optimal number of currant transaction, then use some queue system (or otherwise) to limit the number of currant transaction. Sql Server may have some setting (number of active connections etc) to help, otherwise you will have to write this in your application code.
Oracle is good at allowing reads to bypass writes, however SqlServer is not as standared...
Therefore I would split the stored proc to use two transactions, the first transaction should just:
be a SNAPSHOT (or READ UNCOMMITTED) transaction
find the “Id” of the rows for the seats you wish to sell.
You should then commit (or abort) this transaction,
and use a 2nd (hopefully very short) transaction that
Most likcly is READ COMMITTED, (or maybe SERIALIZABLE)
Selects each row for update (use a locking hint)
Check it has not been sold in the mean time (abort and start again if it has)
Set the “IsSold” flag on the row
(You may be able to the above in a single update statement using “in”, and then check that the expected number of rows were updated)
Sorry sometimes you do need to understant what each time of transaction does and how locking works in detail.
If the table is smaller, then the
update is shorter and the locks are
hold for less time.
Therefore consider splitting the table:
so you have a table that JUST contains “AllocationId” and “IsSold”.
This table could be stored as a single btree (index organized table on AllocationId)
As all the other indexes will be on the table that contrains the details of the seat, no indexes should be locked by the update.

I don't think you'd getting anything out of table partitioning -- the only improvement you'd get would be in fewer disk reads from a smaller (shorter) index trees (each read will hit each level of the index at least once, so the fewer levels the quicker the read.) However, I've got a table with a 4M+ row partition, indexed on 4 columns, net 10 byte key length. It fits in three index levels, with the topmost level 42.6% full. Assuming you had something similar, it seems reasonable that partitioning might only remove one level from the tree, and I doubt that's much of an improvement.
Some off the-cuff hardward ideas:
Raid 5 (and 50) can be slower on writes, because of the parity calculation. Not an issue (or so I'm told) if the disk I/O cache is large enough to handle the workload, but if that's flooded you might want to look at raid 10.
Partition the table across multiple drive arrays. Take two (or more) Raid arrays, distribute the table across the volumes[files/file groups, with or without table partitioning or partitioned views], and you've got twice the disk I/O speed, depending on where the data lies relative to the queries retrieving it. (If everythings on array #1 and array #2 is idle, you've gained nothing.)
Worst case, there's probably leading edge or bleeding edge technology out there that will blow your socks off. If it's critical to your business and you've got the budget, might be worth some serious research.

How long is the update lock hold for?
Why is the lock on the “table” not just the “rows” being sold?
If the lock is hold for more then a
faction of a second that is likely to
be your problem. SqlServer does not
like you holding locks while users
fill in web forms etc.
With SqlServer, you have to implement a “shopping cart” yourself, by temporary reserving the seat until the user pays for it. E.g add a “IsReserved” and “ReservedAt” colunn, then any seats that has been reserved for more then n minutes should be automatically unreserved.
This is a hard problem, as a shopper does not expect a seat that is in stock to be sold to someone else where he is checking out. However you don’t know if the shopper will ever complete the checkout. So how do you show it on a UI. Think about having a look at what other booking websites do then copy one that your users already know how to use.
(Oracle can sometimes cope with lock being kept for a long time, but even Oracle is a lot faster and happier if you keep your locking short.)

I would first try to figure out why the you are locking the table rather than just a row.
One thing to check out is the Execution plan of the Update statement to see what Indexes it causes to be updated and then make sure that row_level_lock and page_level_lock are enabled on those indexes.
You can do so with the following statement.
Select allow_row_locks, allow_page_locks from sys.indexes where name = 'IndexNameHere'

Here are a few ideas:
Make sure your data and logs are on separate spindles, to maximize write performance.
Configure your drives to only use the first 30% or so for data, and have the remainder be for backups (minimize seek / random access times).
Use RAID 10 for the log volume; add more spindles as needed for performance (write performance is driven by the speed of the log)
Make sure your server has enough RAM. Ideally, everything needed for a transaction should be in memory before the transaction starts, to minimize lock times (consider pre-caching). There are a bunch of performance counters you can check for this.
Partitioning may help, but it depends a lot on the details of your app and data...
I'm assuming that the T-SQL, indexes, transaction size, etc, have already been optimized.
In case it helps, I talk about this subject in detail in my book (including SSDs, disk array optimization, etc) -- Ultra-Fast ASP.NET.

SQL Server Performance and clustered index values

I have a table myTable with a unique clustered index myId with fill factor 100%
Its an integer, starting at zero (but its not an identity column for the table)
I need to add a new type of row to the table.
It might be nice if I could distinguish these rows by using negative values of myId.
Would having negative values incur extra page splitting and slow down inserts?
Extra Background:
This table exists as part of the etl for a data warehouse that gathers data from disparate systems. I now want to accomodate a new type of data. A way for me to do this is to reserve negative ids for this new data, which will thus be automatically clustered. This will also avoid major key changes or extra columns in the schema.
Answer Summary:
Fill factors of 100% will noramlly slow down the inserts. But not inserts that happen sequentially, and that includes the sequntial negative inserts.

Besides the practical administration points you already got and the suspect dubious use of negative ids to represent data model attributes, there is also a valid question here: give a table with int ids from 0 to N, inserting new negative values where would those value go and would they cause additional splits?
The initial rows will be placed on the clustered index leaf pages, row with id 0 on first page and row with id N on the last page, filling the pages in between. When the first row with value of -1 is inserted, this will sort ahead of row with id 0 and as such will add a new page to the tree (will allocate an extent of 8 pages actually, but that is a different point) and will link the page in front of the leaf level linked list of pages. This will NOT cause a page split of the former first page. On further inserts of values -2, -3 etc they will go to the same new page and they will be inserted in the proper position (-2 ahead of -1, -3 ahead of -2 etc) until the page fills. Further inserts will add a new page ahead of this one, that will accommodate further new values. Inserts of positive values N+1, N+2 will go at the last page and be placed in it until it fills, then they'll cause a new page to be added and will start filling that page.
So basically the answer is this: inserts at either end of a clustered index should not cause page splits. Page splits can be caused only by inserts between two existing keys. This actually extends to the non-leaf pages as well, an index at either end of the cluster may not split a non-leaf page either. I do not discuss here the impact of updates of course (they can can cause splits if the increase the length of a variable length column).
Lately has been a lot of talk in the SQL Server blogosphere about the potential performance problems of page splits, but I must warn against going to unnecessary extremes to avoid them. Page splits are a normal index operation. If you find yourself in an environment where the page split performance hit is visible during inserts, then you'll be probably worse hit by the 'mitigation' measures because you'll create artificial page latch hot spots that are far worse as they'll affect every insert. What is true is that prolonged operation with frequent splits will result in high fragmentation which impacts the data access time. I say that is best mitigated with off-peak periodical index maintenance operation (reorganize). Avoid premature optimizations, always measure first.

Not enough to notice for any reasonable system.
Page splits happen when a page is full, either at the start or at the end of the range.
As long as you regular index maintenance...
Edit, after Fill factor comments:
After a page split wth 90 or 100 FF, each page will be 50% full. FF = 100 only means an insert will happen sooner (probably 1st insert).
With a strictly monotonically increasing (or decreasing) key (+ve or -ve), a page split happens at either end of the range.
However, from BOL, FILLFACTOR
Fill
Adding Data to the End of the Table
A nonzero fill factor other than 0 or
100 can be good for performance if the
new data is evenly distributed
throughout the table. However, if all
the data is added to the end of the
table, the empty space in the index
pages will not be filled. For example,
if the index key column is an IDENTITY
column, the key for new rows is always
increasing and the index rows are
logically added to the end of the
index. If existing rows will be
updated with data that lengthens the
size of the rows, use a fill factor of
less than 100. The extra bytes on each
page will help to minimize page splits
caused by extra length in the rows.
So does, fillfactor matter for strictly monotonic keys...? Especially if it's low volume writes

No, not at all. Negative values are just as valid as INTegers as positive ones. No problem. Basically, internally, they're all just 4 bytes worth of zeroes and ones :-)
Marc

You are asking the wrong question!
If you create a clustered index that has a fillfactor of 100%, every time a record is inserted, deleted or even modified, page splits can occur because there is likely no room on the existing index data page to write the change.
Even with regular index maintenance, a fill factor of 100% is counter productive on a table where you know inserts are going to be performed. A more usual value would be 90%.

I'm concerned that this post may have taken a wrong turn, in that there seems to be an underlying design issue at work here, irrespective of the resultant page splits.
Why do you need to introduce a negative ID?
An integer primary key, for example, should uniquely indentify a row, it's sign should be irrelevant. I suspect that there may be a definition issue with the primary key for your table if this is not the case.
If you need to flag/identify the newly inserted records then create a column specifically for this purpose.
This solution would be ideal because you may then be able to ensure that your primary key is sequential (perhaps using an Identity data type, although not essential), thereby avoiding issues with page splits (on insert) altogether.
Also, to confirm if I may, a fill factor of 100% for a clustered index primary key (identity integer for example), will not cause page splits for sequential inserts!

Cost of Inserts vs Update in SQL Server

I have a table with more than a millon rows. This table is used to index tiff images. Each image has fields like date, number, etc. I have users that index these images in batches of 500. I need to know if it is better to first insert 500 rows and then perform 500 updates or, when the user finishes indexing, to do the 500 inserts with all the data. A very important thing is that if I do the 500 inserts at first, this time is free for me because I can do it the night before.
So the question is: is it better to do inserts or inserts and updates, and why? I have defined a id value for each image, and I also have other indices on the fields.

Updates in Sql server result in ghosted rows - i.e. Sql crosses one row out and puts a new one in. The crossed out row is deleted later.
Both inserts and updates can cause page-splits in this way, they both effectively 'add' data, it's just that updates flag the old stuff out first.
On top of this updates need to look up the row first, which for lots of data can take longer than the update.
Inserts will just about always be quicker, especially if they are either in order or if the underlying table doesn't have a clustered index.
When inserting larger amounts of data into a table look at the current indexes - they can take a while to change and build. Adding values in the middle of an index is always slower.
You can think of it like appending to an address book: Mr Z can just be added to the last page, while you'll have to find space in the middle for Mr M.

Doing the inserts first and then the updates does seem to be a better idea for several reasons. You will be inserting at a time of low transaction volume. Since inserts have more data, this is a better time to do it.
Since you are using an id value (which is presumably indexed) for updates, the overhead of updates will be very low. You would also have less data during your updates.
You could also turn off transactions at the batch (500 inserts/updates) level and use it for each individual record, thus reducing some overhead.
Finally, test this out to see the actual performance on your server before making a final decision.

This isn't a cut and dry question. Krishna's and Galegian's points are spot on.
For updates, the impact will be lessened if the updates are affecting fixed-length fields. If updating varchar or blob fields, you may add a cost of page splits during update when the new value surpasses the length of the old value.

I think inserts will run faster. They do not require a lookup (when you do an update you are basically doing the equivalent of a select with the where clause). And also, an insert won't lock the rows the way an update will, so it won't interfere with any selects that are happening against the table at the same time.

The execution plan for each query will tell you which one should be more expensive. The real limiting factor will be the writes to disk, so you may need to run some tests while running perfmon to see which query causes more writes and causes the disk queue to get the longest (longer is bad).

I'm not a database guy, but I imagine doing the inserts in one shot would be faster because the updates require a lookup whereas the inserts do not.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight