Why PostgreSQL indexes do not contain visibility information? - database

I know that the physical storage in PostgreSQL looks like:
heap table:
<old_tuple, t_xmin, t_xmax>
<new_tuple, t_xmin, t_xmax>
index:
<old_index_value, old_RID>
<new_index_value, new_RID>
So Index-Only Scan need the help of Visibility Map.
My question is: Why not we store the t_xmin, t_xmax in index as well?
like:
index:
<old_index_value, old_RID, t_xmin, t_xmax>
<new_index_value, new_RID, t_xmin, t_xmax>

There is relatively high overhead - t_xmin, t_xmax has 8 bytes together, and probably will have 16 bytes in future. So if Postgres stores these values to index, then almost all numeric indexes will be 2 (bigint) times or 2/3 times (int) bigger.
Today it's not an problem (probably), but Postgres beginnings are in half of 80 years, and there disc capacity was big problem.
Second motivation is, probably, complexity of code and ensuring data consistency (without hard locking). Indexes in Postgres was data access accelerators, not source of data. Then the implementation is more simpler. Ingres was designed by very smart professors and students and more robust less complex (but possible slower) design was preferred.

Related

Index performance on Postgresql for big tables

I have been searching good information about index benchmarking on PostgreSQL and found nothing really good.
I need to understand how PostgreSQL behaves while handling a huge amount of records.
Let's say 2000M records on a single non-partitioned table.
Theoretically, b-trees are O(log(n)) for reads and writes but in practicality
I think that's kind of an ideal scenario not considering things like HUGE indexes not fitting entirely in memory (swapping?) and maybe other things I am not seeing at this point.
There are no JOIN operations, which is fine, but note this is not an analytical database and response times below 150ms (less the better) are required. All searches are expected to be done using indexes, of course. Where we have 2-3 indexes:
UUID PK index
timestamp index
VARCHAR(20) index (non unique but high cardinality)
My concern is how writes and reads will perform once the table reach it's expected top capacity (2500M records)
... so specific questions might be:
May "adding more iron" achieve reasonable performance in such scenario?
NOTE this is non-clustered DB so this is vertical scaling.
What would be the main sources of time consumption either for reads and writes?
What would be the amount of records on a table that we can consider "too much" for this standard setup on PostgreSql (no cluster, no partitions, no sharding)?
I know this amount of records suggests taking some alternative (sharding, partitioning, etc) but this question is about learning and understanding PostgreSQL capabilities more than other thing.
There should be no performance degradation inserting into or selecting from a table, even if it is large. The expense of index access grows with the logarithm of the table size, but the base of the logarithm is large, and the index shouldn't have a depth of the index cannot be more than 5 or 6. The upper levels of the index are probably cached, so you end up with a handful of blocks read when accessing a single table row per index scan. Note that you don't have to cache the whole index.

What is the degree of B-Tree in Sqlite?

What is the maximum number of nodes that each node can have in the B-Tree used in Sqlite? Are those numbers similar to other relational databases?
SQLite uses a fixed page size that defaults to 4096 bytes but which can be set to any power of two between 512 and 65536. There is some fixed overhead per page (8 bytes for leaf pages, 12 bytes for interior pages), some fixed overhead per slot (2 bytes in the indirection vector plus varying amounts depending on page type and whether it's an index or a table) and the keys/records occupy varying amounts of space depending on their structure and content, and whether stuff has been spilled into overflow pages. In that regard the layout of B-tree pages in SQLite is similar to the layouts used in many other relational databases, and it achieves similar levels of occupancy.
What sets SQLite a bit apart is the heavy use of variants, variable-length integers (varint) and the quasi universal row overflow capability. This introduces so many variables that a size/occupancy estimation is nowhere near as straightforward, accurate and reliable as for, say, classic B-tree tables in MS SQL Server. It is certainly beyond my limited capabilities, unfortunately...
You can read the whole story in the section B-tree Pages of the Database file format documentation at sqlite.org.
P.S.: please heed Shawn's comment regarding the sqlite3 analyser program. I told you at length why it is difficult to say for sure whether God exists, and Shawn points you at a program that simply goes and gives you His bleedin' phone number. ;-)

Why PostgreSQL(timescaledb) costs more storage in table?

I'm new to database. Recently I start using timescaledb, which is an extension in PostgreSQL, so I guess this is also PostgreSQL related.
I observed a strange behavior. I calculated my table structure, 1 timestamp, 2 double, so totally 24bytes per row. And I imported (by psycopg2 copy_from) 2,750,182 rows from csv file. I manually calculated the size should be 63MB, but I query timescaledb, it tells me the table size is 137MB, index size is 100MB and total 237MB. I was expecting that the table size should equal my calculation, but it doesn't. Any idea?
There are two basic reasons your table is bigger than you expect:
1. Per tuple overhead in Postgres
2. Index size
Per tuple overhead: An answer to a related question goes into detail that I won't repeat here but basically Postgres uses 23 (+padding) bytes per row for various internal things, mostly multi-version concurrency control (MVCC) management (Bruce Momjian has some good intros if you want more info). Which gets you pretty darn close to the 137 MB you are seeing. The rest might be because of either the fill factor setting of the table or if there are any dead rows still included in the table from say a previous insert and subsequent delete.
Index Size: Unlike some other DBMSs Postgres does not organize its tables on disk around an index, unless you manually cluster the table on an index, and even then it will not maintain the clustering over time (see https://www.postgresql.org/docs/10/static/sql-cluster.html). Rather it keeps its indices separately, which is why there is extra space for your index. If on-disk size is really important to you and you aren't using your index for, say, uniqueness constraint enforcement, you might consider a BRIN index, especially if your data is going in with some ordering (see https://www.postgresql.org/docs/10/static/brin-intro.html).

How does SELECTing from a table scale with table size?

When I am searching for rows satisfying a certain condition:
SELECT something FROM table WHERE type = 5;
Is it a linear difference in time when I am executing this query on a table containing 10K and 10M of rows?
In other words - is making this kind of queries on a 10K table 1000 times faster than making it on a 10M table?
My table contains a column type which contains numbers from 1 to 10. The most often query on this table will be the one above. If the difference in performance is true, I will have to make 10 tables for each type to achieve a better performance. If this is not really the issue, I will have two tables - one for the types, and the second one for data with column type_id.
EDIT:
There are multiple rows with the type value.
(Answer originally tagged postgresql and this answer is in those terms. Other DBMSes will vary.)
Like with most super broad questions, "it depends".
If there's no index present, then time is probably roughly linear, though with a nearly fixed startup cost plus some breakpoints - e.g. from when the table fits in RAM to when it no longer fits in RAM. All sorts of effects can come into play - memory banking and NUMA, disk readahead, parallelism in the underlying disk subsystem, fragmentation on the file system, MVCC bloat in the tables, etc - that make this far from simple.
If there's a b-tree index on the attribute in question time is going to increase at a less than linear rate - probably around O(log n). How much less with vary based on whether the index fits in RAM, whether the table fits in RAM, etc. However, PostgreSQL usually then has to do a heap lookup for each index pointer, which adds random I/O cost rather unpredictably depending on the data distribution/clustering, caching and readahead, etc. It might be able to do an index-only scan, in which case this secondary lookup is avoided, if vacuum is running enough.
So ... in extremely simplified terms, no index = O(n), with index ~= O(log n). Very, very approximately.
I think the underlying intent of the question is along the lines of: Is it faster to have 1000 tables of 1000 rows, or 1 table of 1,000,000 rows?. If so: In the great majority of cases the single bigger table will be the better choice for performance and administration.

Why DB indexes use balanced trees, not hashtables?

Hashtables seem to be preferable in terms of disk access. What is the real reason that indexes usually implemented with a tree?
Sorry if it's infantile, but i did not find the straight answer on SO.
One of the common actions with data is to sort it or to search for data in a range - a tree will contain data in order while a hash table is only useful for looking up a row and has no idea of what the next row is.
So hash tables are no good for this common case, thanks to this answer
SELECT * FROM MyTable WHERE Val BETWEEN 10000 AND 12000
or
SELECT * FROM MyTable ORDER BY x
Obviously there are cases where hash tables are better but best to deal with the main cases first.
Size, btrees start small and perfectly formed and grow nicely to enormous sizes. Hashes have a fixed size which can be too big (10,000 buckets for 1000 entries) or too small (10,000 buckets for 1,000,000,000 entries) for the amount of data you have.
Hash tables provide no benefit for this case:
SELECT * FROM MyTable WHERE Val BETWEEN 10000 AND 12000
One has to only look at MySQL's hash index implementation associated with MEMORY storage engine to see its disadvantages:
They can be used with equality operators such as = but not with comparison operators such as <
The optimizer cannot use a hash index to speed up ORDER BY operations.
Only whole keys can be used to search for a row. (With a B-tree index, any leftmost prefix of the key can be used to find rows.)
Optimizer cannot determine approximately how many rows there are between two values (this is used by the range optimizer to decide which index to use).
And note that the above applies to hash indexes implemented in memory, without the added consideration of disk access matters associated with indexes implemented on disk.
Disk access factors as noted by #silentbicycle would skew it in favour of the balanced-tree index even more.
Databases typically use B+ trees (a specific kind of tree), since they have better disk access properties - each node can be made the size of a filesystem block. Doing as few disk reads as possible has a greater impact on speed, since comparatively little time is spent on either chasing pointers in a tree or hashing.
Hasing is good when the data is not increasing, more techically when N/n is constant ..
where N = No of elements and n = hash slots ..
If this is not the case hashing doesnt give a good performance gain.
In database most probably the data would be increasing a significant pace so using hash there is not a good idea.
and yes sorting is there too ...
"In database most probably the data would be increasing a significant pace so using hash there is not a good idea."
That is an over-exaggeration of the problem. Yes hash spaces must be fixed in size (modulo solutions ala extensible hashing) and yes, their size must be managed, and yes, someone must do that job.
That said, the performance gains if you exploit hash-based physical location to its fullest potential, are enormous.

Resources