How multiple column b-tree index is organized - database

I want to understand better index organization.
Imagine we have a table with 2 columns:
CREATE TABLE user(
name varchar(100)
,age int)
We would like to create an index:
CREATE INDEX IDX_MultiColIdx on user(name,age)
How would B-Tree index organization look like?
In case of one column, say, age, the organization is clear: every non-leaf node would contain a set of integer keys which would be used for a search. Which values contain nodes of our IDX_MultiColIdx B-Tree index?

Which values contains nodes of our IDX_MultiColIdx B-Tree index?
Values of name, age and the row pointer (RID/ROWID or clustered key, depending on the table organization), sorted lexicographically.
How exactly they will be stored, depends on the datatype and database system.
Usually, CHAR is stored right-padded with spaces up to its size, while VARCHAR is prepended with its length.
MyISAM and some other engines can use key compression: the matching parts of a set of keys are only stored once, and the other keys only store the differing parts, like this:
Hamblin
Hamblin, California
Hamblin (surname)
Hambling Baronets
Hambly
Hambly Arena
Hambly Arena Fire
Hambo
Hambo Lama Itigelov
Hambok
Hambone
will be stored as:
Hamblin
[7], California
[7] (surname)
[7]g Baronets
Hambly
[6] Arena
[6] Arena Fire
Hambo
[5] Lama Itigelov
[5]k
[5]ne
, where [x] means "take leading x characters from the previous key"

I assume you're asking about the internal database implementation because you mention 'non-leaf nodes'.
The interior nodes in a b-tree do not need to store the full key; they only need to store separator keys. Prefix and suffix compression mean that interior nodes can be very dense, and therefore reduce the height of the b-tree and therefore improve overall performance.
For example, given an index with the sequential keys <'A very long string', 314159> and <'Not the same string', 9348>, all the interior node needs to represent is the separation between those those keys, which can be represented in a single character. In a similar way, when the keys to be separated in the interior node have a common prefix, that prefix need only be stored once and the point where they diverge represented.
The leaf nodes need to store the full key values, and can be stored in a linked list for key order traversal. Leaf node pages can be compressed by using prefix compression or other techniques to further reduce the tree height.
For a good reference on this, see "Transaction Processing: Concepts and Techniques" by Gray & Reuter, and follow the references if you want more detail.

Related

Does ScyllaDB optimize search of clustering key?

I have a table with 3 clustering keys:
K1 K2 C1 C2 C3 V1 V2
Where K1 & K2 are the partition keys, C1, C2 and C3 are the clustering keys and V1 and V2 are two value columns.
In this example, C1, C2 and C3 represent the 3 coordinates of some shape (where each coordinate is a number from 1 to approx. 500). Each partition key in our table is linked to several hundred different values.
If I want to search for a row of K1 & K2 that is has a clustering key equal to C1 = 50, C2 = 450 and C3 = 250 how would Scylla execute this search assuming the clustering key is sorted from lowest to highest (ASC order)? Does Scylla always start searching from the beginning of the column to see whether a given key exists? In other words, I’m assuming Scylla will first search C1 column for the value 50. If Scylla detects that we are at value 51+ and could not find C1 contains 50 it could stop searching the rest of the C1's column data since the data is sorted so there’s no way for the value 50 to appear after 51. In this case, it would not even need to check whether C2 contains the value 450 since we need all 3 clustering columns to match. If, however, C1 contains the value 50, it will move onto C2 and search (once again starting from the first entry of C2 column) whether C2 contains the value 450.
However, when C1 = 50 it would indeed be more efficient to start from the beginning. But when C2 = 450 (and the highest index = 500) it would be more efficient to start from the end. This is assuming Scylla “knows” the lowest / highest value for each of the clustering columns.
Does Scylla, in fact, optimize search in this fashion or does it take an entirely different approach?
Perhaps another way to phrase the question is as follows: Does it normally take Scylla longer to search for C1 = 450 vs. C1 = 50? Obviously, since we only have a small dateset in this example the effect won’t be huge but if the dateset contained tens of thousands of entries the effect would be more pronounced.
Thanks
Scylla has two data structures to search when executing a query on the replica:
The in-memory data structure, used by row-cache and memtables
The on-disk data structure, used by SStables
In both cases, data is organized on two levels: on the partition and row level. So Scylla will first do a lookup of the sought-after partition, then do a lookup of the sought-after row (by its clustering key) in the already looked-up partition.
Both of these data structures are sorted on both levels and in general the lookup happens via a binary search.
We use a btree in memory, binary search in this works as expected.
The SStable files include two indexes, an index, which covers all the partition found in the data file and a so called "summary", which is a sampled index of the index, this latter is always kept in memory in its entirety. Furthermore, if the partition has a lot of rows, the index will contain a so called "promoted index", which is a sampled index of the clustering keys found therein. So a query will first locate the "index-page" using a binary search in the in-memory summary. The index-page is the portion of the index file which contains the index entry for the sough-after partition. This index-page is then read linearly until the index-entry of interest is found. This gives us the start position of the partition in the data file. If the index entry also contains a promoted index, we can furthermore do a lookup in that to get a position closer to the start of said row in the data file. We then start parsing the data file at the position we got until the given row is found.
Note that clustering keys are not found one column at a time. A clustering key, regardless of how many components it has, is treated as a single value: a tuple of 1+ components. When a query doesn't specify all the components, an incomplete tuple called a "prefix" is created. This can be compared to other partial or full keys.

Single word indexing technique

Suppose i have a large array where each element is one word, and i want to build an index.
Take the word Water, i can write a function that returns
w
wa
wat
wate
water
at
ate
ater
ter
er
r
and those results would be keys in a hash table where the values are arrays of words that contain the key.
Given that i don't care about memory consumption, and the data is read only, i.e inserted only at app startup:
theoretically what would beat this technique in terms of lookup performance?
what the name of this technique?
I think you're looking for a Trie:
a trie, also called digital tree and sometimes radix tree or prefix
tree (as they can be searched by prefixes), is a kind of search
tree—an ordered tree data structure that is used to store a dynamic
set or associative array where the keys are usually strings.

How do I index variable length strings, integers, binaries in b-tree?

I am creating a database storage engine (for fun).
I know it uses b-trees (and stuff), but in all of b-tree base examples, it shows that we need to sort keys and then store it for indexing, not for integers.
I can understand sorting, but how to do it for strings, if I have string as a key for indexing?
Ex : I want to index all email addresses in btree , how would I do that ??
It does not matter, what type of data you are sorting. For a B-Tree you only need a comparator. The first value you put into your db is the root. The second value gets compared to the root. If smaller, then continue down left, else right. Inserting new values often requires to restructure your tree.
A comparator for a string could use the length of the string or compare it alphabetically or count the dots in an email behind the at-sign.

Searching a file and returning value - Super Fast

I have a set of data which has a name, some sub values and then a associative numeric value. For example:
James Value1 Value2 "1.232323/1.232334"
Jim Value1 Value2 "1.245454/1.232999"
Dave Value1 Value2 "1.267623/1.277777"
There will be around 100,000 entries like this stored in either a file or database. I would like to know, what is the quickest way of being able to return the results which match a search, along with their associated numeric value.
For example, a query of "J" would return both James and Jim results which the numeric values in the last column.
I've heard people mention binary tree searching, dictionary searching, indexed searching. I have no idea which is a good route to peruse.
This is a poorly characterized problem. As with many optimization problems, there are trade-offs in resources. If you truly want the fastest response possible, then a likely approach is to compile all possible searches into a table of prepared results, so that, given a search key, you can look the search key up in the table and return the result.
Assuming your character set is limited to A-Z and a-z, a table with an entry for each search key from 0 to 4 characters will use a modest amount of memory by today’s standards. Each table entry merely needs to have two values in it: The start and end positions in a list of the numeric values. (Compile the list in this way: Sort the records by the name field. Extract just the numeric values from the records, maintaining the order, putting them in a list. Any search key must return a sublist of contiguous records from that list. This is because the search is for a prefix string of the name field, so any records that match the search key are adjacent, when sorted by the name field.)
Thus, to create a table to look up any key of 0 to 4 characters, you need fewer than 534 entries in a table of pairs, where each member of the pair contains a record number (32 bits or fewer). So 8•534 = 60.2 MiB suffices. (53 is because you have 52 characters plus one sentinel character to mark the end of the key. Alternate encodings could reduce this some.)
To support keys of more than 4 characters, you need to extend this. With typical data, 4 characters will have narrowed down the search greatly, so you can take the set of records indicated by the first 4 characters and prune it to get the final results. If the data has pathological cases where 4 characters does not reduce the search much, you can embellish this technique.
So, is that really what you want to do, make the speed as fast as possible, regardless of other resources (including engineering time) consumed? If not, what are your actual goals?

Data structure to handle the requirement of following use case

All the records in a database is saved in (key, value) pair formats.Records can always be retrieved by specifying key value. Data structure needs to be developed to handle following scenarios
Access all the records in a linear fashion (Array or linked list is best data structure for this scenario to access in O(N) time)
retrieve the record by providing the key (hash table can be implemented to index it in O(1) complexity)
Retrieve set of records for a value at a particular byte in the key . Ex: List of all records for which 2nd number(10's place) in the key should be 5 and if the keys are 256, 1452, 362, 874, the records for keys , 256 and 1452 should be returned
I am assuming you keys are at most d digits long (in decimal).
How about a normal hashtable and an additional 10*d two dimensional array (let's call it A) of sets. A[i][j] is the set of keys which have digit i in the jth position. The sets can support O(1) insert/delete if implemented themselves as hashtables.
For 1 and 2, I think Linked Hash Map is a good choice.
For the point 3, an additional Hash map with (digit, position) tuple as key and list of pointers to the values.
Both data structures can be wrapped inside one, and both will point to the same data, of course.
Store the keys in a trie. For the numbers in your example (assuming 4 digit numbers) it looks like this:
*root*
|
0 -- 2 - 5 - 6
| |
| +- 3 - 6 - 2
| |
| +- 8 - 7 - 4
|
1 - 4 - 5 - 2
This data structure can be traversed in a way that returns (1) or (3). It won't be quite as fast for (3) as would maintaining an index for each digit, so I guess it's a question of whether space or lookup time is your primary concern. For (2), it is already O(log n), but if you need O(1), you could store the keys in both the trie and a hash table.
The first thing that comes to mind is embedding a pair of nodes in each record. One of the nodes would be a part of a tree sorted by the record index and the other, part of a tree sorted by the record key. You can then have quick access to the records by index or key using these trees. With this you can also quickly visit records in sequential index or key order. This covers the first and second requirement.
You can add another node for a tree of records whose values contain 5 in the tens position. That covers the third requirement.
Extra benefit: the same tree handling code will be used in all cases.
A dictionary (hash map, etc.) would easily handle those requirements, although your third requirement would be an O(N) operation. You just iterate over the keys and select those that match your criteria. You don't say what your desired performance is for that case.
But O(N) might be plenty fast enough. How many items are in your data set, and how often will you be performing that third function?

Resources