What is the maximum number of nodes that each node can have in the B-Tree used in Sqlite? Are those numbers similar to other relational databases?
SQLite uses a fixed page size that defaults to 4096 bytes but which can be set to any power of two between 512 and 65536. There is some fixed overhead per page (8 bytes for leaf pages, 12 bytes for interior pages), some fixed overhead per slot (2 bytes in the indirection vector plus varying amounts depending on page type and whether it's an index or a table) and the keys/records occupy varying amounts of space depending on their structure and content, and whether stuff has been spilled into overflow pages. In that regard the layout of B-tree pages in SQLite is similar to the layouts used in many other relational databases, and it achieves similar levels of occupancy.
What sets SQLite a bit apart is the heavy use of variants, variable-length integers (varint) and the quasi universal row overflow capability. This introduces so many variables that a size/occupancy estimation is nowhere near as straightforward, accurate and reliable as for, say, classic B-tree tables in MS SQL Server. It is certainly beyond my limited capabilities, unfortunately...
You can read the whole story in the section B-tree Pages of the Database file format documentation at sqlite.org.
P.S.: please heed Shawn's comment regarding the sqlite3 analyser program. I told you at length why it is difficult to say for sure whether God exists, and Shawn points you at a program that simply goes and gives you His bleedin' phone number. ;-)
Related
I'm studying SAP HANA main memory database.
There is index called CPBTree in it. In it's document, it is described as follows:
CPB+-tree stands for Compressed Prefix B+-Tree; this index tree type
is based on pkB-tree. CPB+-tree is a very small index because it uses
'partial key' that is only part of full key in index nodes.
This is a bit vague. There is no other explanation about CPBTree structure on the Internet.
Is there anyone who can explain more or introduce a good document?
Where to begin here?
B-trees are very intensely studied and developed data structures, so pointing to a single document that explains all aspects relevant to this question and SAP HANA is a bit difficult.
Maybe it helps to unpack the term first:
Compressed Prefix
This basically means, the B-tree index and leaf nodes do not contain the full strings for keys. Instead, the parts of the key-strings that are common among the keys (the prefixes) are stored separately. The leaf and index nodes then only contain
the pointer to the prefix
a sort of "delta" that contains the remaining key (this is where the partial key from the pkB-tree comes in)
and a pointer to the data record (row id)
This technique is rather common in many DBMS, usually attached to a feature called "index compression" or something similar.
So, now we know that HANA uses compressed B-tree indexes (for row-store tables and for data that can be expressed as strings).
Why is this important for an in-memory database like HANA?
In short: memory transfer effort between RAM and CPU.
The smaller the index structure, the more of it can fit into the CPU caches. To traverse (go through) the index, fewer back-and-forth movements of data have to be performed.
It's a huge performance advantage.
This is complemented with specific "cache-conscious" index protocols (how the index structure is used by the HANA kernel) that try to minimize the RAM-CPU data transfers.
All this is an overly simplified explanation and I hope that it helps to make more sense nevertheless.
If you want to "dive deeper" and start reading academic papers around that topic then Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems by Prof. Sang K. Cha. et al.
This is the same Sang K. Cha that created P*Time, an in-memory (row-store) DBMS in the early 2000s.
This P*Time has been, rather well-known, acquired by SAP (like so many other DBMS software products companies... Sybase... MaxDB... OrientDB...) and the technology has been used as a research base for what would become SAP HANA.
Nowadays, there is only a small part of P*Time still in SAP HANA and it is mostly reduced to the concepts and algorithms and not so much expressed in actual P*Time code.
All in all, for the user of HANA (developer, admin, data consumer) the specifics of this index implementation hardly matter as none of them can interact with the index structure directly.
What matters is that this index takes modern server systems (many cores, large CPU caches, lots of RAM) and extracts great performance from them, while still allowing for "high-speed" transactions.
I added an extended write-up of this answer to my blog: https://lbreddemann.org/what-is-cpb-tree-in-sap-hana/.
I'm looking for a space-efficient key-value mapping/dictionary/database which satisfies certain properties:
Format: The keys will be represented by http(s) URIs. The values will be variable length binary data.
Size: There will be 1-100 billion unique keys (average length 60-70 bytes). Values will initially only be a few tens of bytes but might eventually grow to tens of kilobytes in size (perhaps even more if I decide to store multiple versions). The total size of the data will be measured in terabytes or petabytes.
Hardware: The data will have to be distributed across multiple machines. This distribution should ensure that all URIs from a particular domain end up on the same machine. Furthermore, data on a machine will have to be distributed between the RAM, SSD, and HDD according to how frequently it is accessed. Data will have to be shifted around as machines are added or removed from the cluster. Replication is not needed initially, but might be useful later.
Access patterns: I need both sequential and (somewhat) random access to the data. The sequential access will be from a low-priority batch process that continually scans through the data. Throughput is much more important than latency in this case. Ideally, the iteration will proceed lexicographicaly (i.e. dictionary order). The random accesses arise from accessing the URIs in an HTML page, I expect that most of these will point to URIs from the same domain as the page and hence will be located on the same machine, while others will be located on different machines. I anticipate needing at most 100,000 to 1,000,000 in-memory random accesses per second. The data is not static. Reads will occur one or two orders of magnitude more often than writes.
Initially, the data will be composed of 100 million to 1 billion urls with several tens of bytes of data per url. It will be hosted on a small number of cheap commodity servers with 10-20GBs of RAM and several TBs of hard drives. In this case, most of the space will be taken up storing the keys and indexing information. For this reason, and because I have a tight budget, I'm looking for something which will allow me to store this information in as little space as possible. In particular, I'm hoping to exploit the common prefixes shared by many of the URIs. In this way, I believe it might be possible to store the keys and index in less space than the total length of the URIs.
I've looked at several traditional data structures (e.g. hash-maps, self-balancing trees (e.g. red-black, AVL, B), tries). Only the tries (with some tricks) seem to have the potential for reducing the size of the index and keys (all the others store the keys in addition to the index). The most promising option I've thought of is to split URIs into several components (e.g. example.org/a/b/c?d=e&f=g becomes something like [example, org, a, b, c, d=e, f=g]). The various components would each index a child in subsequent levels of a tree-like structure, kind of like a filesystem. This seems profitable as a lot of URIs share the same domain and directory prefix.
Unfortunately, I don't know much about the various database offerings. I understand that a lot of them use B-trees to index the data. As I understand it, the space required by the index and keys exceeds the total length of the URLs.
So, I would like to know if anyone can offer some guidance as to any data structures or databases that can exploit the redundancy in the URIs to save space. The other stuff is less important, but any help there would be appreciated too.
Thanks, and sorry for the verbosity ;)
For a B-tree of order m, every node except the root must contain m-1 to 2m-1 elements, where every element is at least a key and maybe also some additional data (e.g., a value). Yet each node must have some constant total size picked to give good performance on the underlying block device. So what happens if your elements are of variable size?
SQLite3 seems to have a scheme for tacking additional block-sized pieces onto its nodes, and MySQL lets you declare the size of your records (e.g., you can type your fields to be not just strings but strings under some size). What other solutions are there? And what do people think about when picking one over the other?
edit: And by the previous sentence, I mean, what do database developers think about when deciding to implement their B-trees one way over the other?
(I'm in a databases course right now, so I'm more interested in the theory and design angle than in details of particular systems.)
I know that SQL Server can have key length up to 900 bytes at a page size of 8192 bytes. If you actually have 900 bytes keys only 9 (or 8) rows will fit on an index'es intermediate-level pages. This just means that the branching factor is lower than usual. This might violate the theoretical B-tree invariant but this is just an academic concern which does not impede performance in a significant way. It does not change asymptotic complexity of the algorithms involved.
In short: This is a purely academic concern.
I think this is quite a good question. Although RDBMS vendors all have slightly different implementations, the underlying theory is the same and I doubt anyone uses b-tree implementations as the determining factor in choosing a vendor.
As I understand it, the basic structure of each b-tree page contains keys and pointers. The pointers continually reference other pages containing more keys and pointers with the final pointer referencing the associated data record.
How to Handle variable length keys is interesting. Perhaps others can shed some light on vendor specific solutions.
What are good sizes for data types in SQL Server? When defining columns, i see data types with sizes of 50 as one of the default sizes(eg: nvarchar(50), binary(50)). What is the significance of 50? I'm tempted to use sizes of powers of 2, is that better or just useless?
Update 1
Alright thanks for your input guys. I just wanted to know the best way of defining the size of a datatype for a column.
There is no reason to use powers of 2 for performance etc. Data length should be determined by the size stored data.
Why not the traditional powers of 2, minus 1 such as 255...
Seriously, the length should match what you need and is suitable for your data.
Nothing else: how the client uses it, aligns to 32 bit word boundary, powers of 2, birthdays, Scorpio rising in Uranus, roll of dice...
The reason so many fields have a length of 50 is that SQL Server defaults to 50 as the length for most data types where length is an issue.
As has been said, the length of a field should be appropriate to the data that is being stored there, not least because there is a limit to the length of single record in SQL Server (it's ~8000 bytes). It is possible to blow past that limit.
Also, the length of your fields can be considered part of your documentation. I don't know how many times I've met lazy programmers who claim that they don't need to document because the code is self documenting and then they don't bother doing the things that would make the code self documenting.
You won't gain anything from using powers of 2. Make the fields as long as your business needs really require them to be - let SQL Server handle the rest.
Also, since the SQL Server page size is limited to 8K (of which 8060 bytes are available to user data), making your variable length strings as small as possible (but as long as needed, from a requirements perspective) is a plus.
That 8K limit is a fixed SQL Server system setting which cannot be changed.
Of course, SQL Server these days can handle more than 8K of data in a row, using so called "overflow" pages - but it's less efficient, so trying to stay within 8K is generally a good idea.
Marc
The size of a field should be appropriate for the data you are planning to store there, global defaults are not a good idea.
It's a good idea that the whole row fits into page several times without leaving too much free space.
A row cannot span two pages, an a page has 8096 bytes of free space, so two rows that take 4049 bytes each will occupy two pages.
See docs on how to calculate the space occupied by one row.
Also note that VAR in VARCHAR and VARBINARY stands for "varying", so if you put a 1-byte value into a 50-byte column, it will take but 1 byte.
This totally depends on what you are storing.
If you need x chars use x not some arbitrarily predefined amount.
Suppose you have a really large table, say a few billion unordered rows, and now you want to index it for fast lookups. Or maybe you are going to bulk load it and order it on the disk with a clustered index. Obviously, when you get to a quantity of data this size you have to stop assuming that you can do things like sorting in memory (well, not without going to virtual memory and taking a massive performance hit).
Can anyone give me some clues about how databases handle large quantities of data like this under the hood? I'm guessing there are algorithms that use some form of smart disk caching to handle all the data but I don't know where to start. References would be especially welcome. Maybe an advanced databases textbook?
Multiway Merge Sort is a keyword for sorting huge amounts of memory
As far as I know most indexes use some form of B-trees, which do not need to have stuff in memory. You can simply put nodes of the tree in a file, and then jump to varios position in the file. This can also be used for sorting.
Are you building a database engine?
Edit: I built a disc based database system back in the mid '90's.
Fixed size records are the easiest to work with because your file offset for locating a record can be easily calculated as a multiple of the record size. I also had some with variable record sizes.
My system needed to be optimized for reading. The data was actually stored on CD-ROM, so it was read-only. I created binary search tree files for each column I wanted to search on. I took an open source in-memory binary search tree implementation and converted it to do random access of a disc file. Sorted reads from each index file were easy and then reading each data record from the main data file according to the indexed order was also easy. I didn't need to do any in-memory sorting and the system was way faster than any of the available RDBMS systems that would run on a client machine at the time.
For fixed record size data, the index can just keep track of the record number. For variable length data records, the index just needs to store the offset within the file where the record starts and each record needs to begin with a structure that specifies it's length.
You would have to partition your data set in some way. Spread out each partition on a separate server's RAM. If I had a billion 32-bit int's - thats 32 GB of RAM right there. And thats only your index.
For low cardinality data, such as Gender (has only 2 bits - Male, Female) - you can represent each index-entry in less than a byte. Oracle uses a bit-map index in such cases.
Hmm... Interesting question.
I think that most used database management systems using operating system mechanism for memory management, and when physical memory ends up, memory tables goes to swap.