Does ScyllaDB optimize search of clustering key? - database

I have a table with 3 clustering keys:
K1 K2 C1 C2 C3 V1 V2
Where K1 & K2 are the partition keys, C1, C2 and C3 are the clustering keys and V1 and V2 are two value columns.
In this example, C1, C2 and C3 represent the 3 coordinates of some shape (where each coordinate is a number from 1 to approx. 500). Each partition key in our table is linked to several hundred different values.
If I want to search for a row of K1 & K2 that is has a clustering key equal to C1 = 50, C2 = 450 and C3 = 250 how would Scylla execute this search assuming the clustering key is sorted from lowest to highest (ASC order)? Does Scylla always start searching from the beginning of the column to see whether a given key exists? In other words, I’m assuming Scylla will first search C1 column for the value 50. If Scylla detects that we are at value 51+ and could not find C1 contains 50 it could stop searching the rest of the C1's column data since the data is sorted so there’s no way for the value 50 to appear after 51. In this case, it would not even need to check whether C2 contains the value 450 since we need all 3 clustering columns to match. If, however, C1 contains the value 50, it will move onto C2 and search (once again starting from the first entry of C2 column) whether C2 contains the value 450.
However, when C1 = 50 it would indeed be more efficient to start from the beginning. But when C2 = 450 (and the highest index = 500) it would be more efficient to start from the end. This is assuming Scylla “knows” the lowest / highest value for each of the clustering columns.
Does Scylla, in fact, optimize search in this fashion or does it take an entirely different approach?
Perhaps another way to phrase the question is as follows: Does it normally take Scylla longer to search for C1 = 450 vs. C1 = 50? Obviously, since we only have a small dateset in this example the effect won’t be huge but if the dateset contained tens of thousands of entries the effect would be more pronounced.
Thanks

Scylla has two data structures to search when executing a query on the replica:
The in-memory data structure, used by row-cache and memtables
The on-disk data structure, used by SStables
In both cases, data is organized on two levels: on the partition and row level. So Scylla will first do a lookup of the sought-after partition, then do a lookup of the sought-after row (by its clustering key) in the already looked-up partition.
Both of these data structures are sorted on both levels and in general the lookup happens via a binary search.
We use a btree in memory, binary search in this works as expected.
The SStable files include two indexes, an index, which covers all the partition found in the data file and a so called "summary", which is a sampled index of the index, this latter is always kept in memory in its entirety. Furthermore, if the partition has a lot of rows, the index will contain a so called "promoted index", which is a sampled index of the clustering keys found therein. So a query will first locate the "index-page" using a binary search in the in-memory summary. The index-page is the portion of the index file which contains the index entry for the sough-after partition. This index-page is then read linearly until the index-entry of interest is found. This gives us the start position of the partition in the data file. If the index entry also contains a promoted index, we can furthermore do a lookup in that to get a position closer to the start of said row in the data file. We then start parsing the data file at the position we got until the given row is found.
Note that clustering keys are not found one column at a time. A clustering key, regardless of how many components it has, is treated as a single value: a tuple of 1+ components. When a query doesn't specify all the components, an incomplete tuple called a "prefix" is created. This can be compared to other partial or full keys.

Related

Searching a file and returning value - Super Fast

I have a set of data which has a name, some sub values and then a associative numeric value. For example:
James Value1 Value2 "1.232323/1.232334"
Jim Value1 Value2 "1.245454/1.232999"
Dave Value1 Value2 "1.267623/1.277777"
There will be around 100,000 entries like this stored in either a file or database. I would like to know, what is the quickest way of being able to return the results which match a search, along with their associated numeric value.
For example, a query of "J" would return both James and Jim results which the numeric values in the last column.
I've heard people mention binary tree searching, dictionary searching, indexed searching. I have no idea which is a good route to peruse.
This is a poorly characterized problem. As with many optimization problems, there are trade-offs in resources. If you truly want the fastest response possible, then a likely approach is to compile all possible searches into a table of prepared results, so that, given a search key, you can look the search key up in the table and return the result.
Assuming your character set is limited to A-Z and a-z, a table with an entry for each search key from 0 to 4 characters will use a modest amount of memory by today’s standards. Each table entry merely needs to have two values in it: The start and end positions in a list of the numeric values. (Compile the list in this way: Sort the records by the name field. Extract just the numeric values from the records, maintaining the order, putting them in a list. Any search key must return a sublist of contiguous records from that list. This is because the search is for a prefix string of the name field, so any records that match the search key are adjacent, when sorted by the name field.)
Thus, to create a table to look up any key of 0 to 4 characters, you need fewer than 534 entries in a table of pairs, where each member of the pair contains a record number (32 bits or fewer). So 8•534 = 60.2 MiB suffices. (53 is because you have 52 characters plus one sentinel character to mark the end of the key. Alternate encodings could reduce this some.)
To support keys of more than 4 characters, you need to extend this. With typical data, 4 characters will have narrowed down the search greatly, so you can take the set of records indicated by the first 4 characters and prune it to get the final results. If the data has pathological cases where 4 characters does not reduce the search much, you can embellish this technique.
So, is that really what you want to do, make the speed as fast as possible, regardless of other resources (including engineering time) consumed? If not, what are your actual goals?

Data structure to handle the requirement of following use case

All the records in a database is saved in (key, value) pair formats.Records can always be retrieved by specifying key value. Data structure needs to be developed to handle following scenarios
Access all the records in a linear fashion (Array or linked list is best data structure for this scenario to access in O(N) time)
retrieve the record by providing the key (hash table can be implemented to index it in O(1) complexity)
Retrieve set of records for a value at a particular byte in the key . Ex: List of all records for which 2nd number(10's place) in the key should be 5 and if the keys are 256, 1452, 362, 874, the records for keys , 256 and 1452 should be returned
I am assuming you keys are at most d digits long (in decimal).
How about a normal hashtable and an additional 10*d two dimensional array (let's call it A) of sets. A[i][j] is the set of keys which have digit i in the jth position. The sets can support O(1) insert/delete if implemented themselves as hashtables.
For 1 and 2, I think Linked Hash Map is a good choice.
For the point 3, an additional Hash map with (digit, position) tuple as key and list of pointers to the values.
Both data structures can be wrapped inside one, and both will point to the same data, of course.
Store the keys in a trie. For the numbers in your example (assuming 4 digit numbers) it looks like this:
*root*
|
0 -- 2 - 5 - 6
| |
| +- 3 - 6 - 2
| |
| +- 8 - 7 - 4
|
1 - 4 - 5 - 2
This data structure can be traversed in a way that returns (1) or (3). It won't be quite as fast for (3) as would maintaining an index for each digit, so I guess it's a question of whether space or lookup time is your primary concern. For (2), it is already O(log n), but if you need O(1), you could store the keys in both the trie and a hash table.
The first thing that comes to mind is embedding a pair of nodes in each record. One of the nodes would be a part of a tree sorted by the record index and the other, part of a tree sorted by the record key. You can then have quick access to the records by index or key using these trees. With this you can also quickly visit records in sequential index or key order. This covers the first and second requirement.
You can add another node for a tree of records whose values contain 5 in the tens position. That covers the third requirement.
Extra benefit: the same tree handling code will be used in all cases.
A dictionary (hash map, etc.) would easily handle those requirements, although your third requirement would be an O(N) operation. You just iterate over the keys and select those that match your criteria. You don't say what your desired performance is for that case.
But O(N) might be plenty fast enough. How many items are in your data set, and how often will you be performing that third function?

DB - Query processing - Index Nested Loop Join

I have a relation R with 10 blocks. S with 1000
I have also 50 unique records for attribute A in relation R, and 5000 unique records for attribute A in relation S.
I have 100 records on each block.
Note that we assume a uniform distribution of the different values in each relation.
S has a clustring index on the join attribute A.
The question is : How many blocks of S store any of the records that participate in the join with R. I need to answer with the best and the worst case.
I thought that if R has 50 unique records for A and it's clustering index, it will take minimum 1 block for each unique and maximum 2. and then the answer is 50 or 100.
But, why can't I put 5 unique records in each block so the maximum number of block is 10?
As far as I understand this is the situation:
S has 1000 blocks with 100 records/block which leads to 100000 records (max). Of those 100000 records are 5000 unique (different) values for attribute A.
Edit:
If they are all evenly distributed every unique value for A would have 20 rows in S . If all of the 50 unique values for A in R are present in S, then 50 row groups would be fetched.
In the best case there are all stored together (thanks to the clustered index) and you need to read 10 Blocks. [(50 values for A * 20 rows with same value in S / 100 records per block = 10 blocks]
In the worst case the 20 rows for every value in A are using 2 blocks. This would lead to 100 blocks you need to read from S.
To your second question:
Since you have a clustered index containing the column A all the same values for A will be stored together. They only use more than one block if they don’t fit into one or if the block was filled by other values and therefore can’t fit in one block.
Attention: I may not have fully understand your initial question and therefore my answer could be totally wrong!

How multiple column b-tree index is organized

I want to understand better index organization.
Imagine we have a table with 2 columns:
CREATE TABLE user(
name varchar(100)
,age int)
We would like to create an index:
CREATE INDEX IDX_MultiColIdx on user(name,age)
How would B-Tree index organization look like?
In case of one column, say, age, the organization is clear: every non-leaf node would contain a set of integer keys which would be used for a search. Which values contain nodes of our IDX_MultiColIdx B-Tree index?
Which values contains nodes of our IDX_MultiColIdx B-Tree index?
Values of name, age and the row pointer (RID/ROWID or clustered key, depending on the table organization), sorted lexicographically.
How exactly they will be stored, depends on the datatype and database system.
Usually, CHAR is stored right-padded with spaces up to its size, while VARCHAR is prepended with its length.
MyISAM and some other engines can use key compression: the matching parts of a set of keys are only stored once, and the other keys only store the differing parts, like this:
Hamblin
Hamblin, California
Hamblin (surname)
Hambling Baronets
Hambly
Hambly Arena
Hambly Arena Fire
Hambo
Hambo Lama Itigelov
Hambok
Hambone
will be stored as:
Hamblin
[7], California
[7] (surname)
[7]g Baronets
Hambly
[6] Arena
[6] Arena Fire
Hambo
[5] Lama Itigelov
[5]k
[5]ne
, where [x] means "take leading x characters from the previous key"
I assume you're asking about the internal database implementation because you mention 'non-leaf nodes'.
The interior nodes in a b-tree do not need to store the full key; they only need to store separator keys. Prefix and suffix compression mean that interior nodes can be very dense, and therefore reduce the height of the b-tree and therefore improve overall performance.
For example, given an index with the sequential keys <'A very long string', 314159> and <'Not the same string', 9348>, all the interior node needs to represent is the separation between those those keys, which can be represented in a single character. In a similar way, when the keys to be separated in the interior node have a common prefix, that prefix need only be stored once and the point where they diverge represented.
The leaf nodes need to store the full key values, and can be stored in a linked list for key order traversal. Leaf node pages can be compressed by using prefix compression or other techniques to further reduce the tree height.
For a good reference on this, see "Transaction Processing: Concepts and Techniques" by Gray & Reuter, and follow the references if you want more detail.

Physical storage of data in Access 2007

I've been trying to estimate the size of an Access table with a certain number of records.
It has 4 Longs (4 bytes each), and a Currency (8 bytes).
In theory: 1 Record = 24 bytes, 500,000 = ~11.5MB
However, the accdb file (even after compacting) increases by almost 30MB (~61 bytes per record). A few extra bytes for padding wouldn't be so bad, but 2.5X seems a bit excessive - even for Microsoft bloat.
What's with the discrepancy? The four longs are compound keys, would that matter?
This is the result of my tests, all conducted with an A2003 MDB, not with A2007 ACCDB:
98,304 IndexTestEmpty.mdb
131,072 IndexTestNoIndexesNoData.mdb
11,223,040 IndexTestNoIndexes.mdb
15,425,536 IndexTestPK.mdb
19,644,416 IndexTestPKIndexes1.mdb
23,838,720 IndexTestPKIndexes2.mdb
24,424,448 IndexTestPKCompound.mdb
28,041,216 IndexTestPKIndexes3.mdb
28,655,616 IndexTestPKCompoundIndexes1.mdb
32,849,920 IndexTestPKCompoundIndexes2.mdb
37,040,128 IndexTestPKCompoundIndexes3.mdb
The names should be pretty self-explanatory, I think. I used an append query with Rnd() to append 524,288 records of fake data, which made the file 11MBs. The indexes I created on the other fields were all non-unique. But if you see the compound 4-column index increased the size from 11MBs (no indexes) to well over 24MBs. A PK on the first column only increased the size only from 11MBs to 15.4MBs (using fake MBs, of course, i.e., like hard drive manufacturers).
Notice how each single-column index added approximately 4MBs to the file size. If you consider that 4 columns with no indexes totalled 11MBs, that seems about right based on my comment above, i.e., that each index should increase the file size by about the amount of data in the field being indexed. I am surprised that the clustered index did this, too -- I thought that the clustered index would use less space, but it doesn't.
For comparison, a non-PK (i.e., non-clustered) unique index on the first column, starting from IndexTestNoIndexes.mdb is exactly the same size as the database with the first column as the PK, so there's no space savings from the clustered index at all. On the off chance that perhaps the ordinal position of the indexed field might make a difference, I also tried a unique index on the second column only, and this came out exactly the same size.
Now, I didn't read your question carefully, and omitted the Currency field, but if I add that to the non-indexed table and the table with the compound index and populate it with random data, I get this:
98,304 IndexTestEmpty.mdb
131,072 IndexTestNoIndexesNoData.mdb
11,223,040 IndexTestNoIndexes.mdb
15,425,536 IndexTestPK.mdb
15,425,536 IndexTestIndexUnique2.mdb
15,425,536 IndexTestIndexUnique1.mdb
15,482,880 IndexTestNoIndexes+Currency.mdb
19,644,416 IndexTestPKIndexes1.mdb
23,838,720 IndexTestPKIndexes2.mdb
24,424,448 IndexTestPKCompound.mdb
28,041,216 IndexTestPKIndexes3.mdb
28,655,616 IndexTestPKCompoundIndexes1.mdb
28,692,480 IndexTestPKCompound+Currency.mdb
32,849,920 IndexTestPKCompoundIndexes2.mdb
37,040,128 IndexTestPKCompoundIndexes3.mdb
The points of comparison are:
11,223,040 IndexTestNoIndexes.mdb
15,482,880 IndexTestNoIndexes+Currency.mdb
24,424,448 IndexTestPKCompound.mdb
28,692,480 IndexTestPKCompound+Currency.mdb
So, the currency field added another 4.5MBs, and its index added another 4MBs. And if I add non-unique indexes to the 2nd, 3rd and 4th long fields, the database 41,336,832, and increase in size of just under 12MBs (or ~4MBs per additional index).
So, this basically replicates your results, no? And I ended up with the same file sizes, roughly speaking.
The answer to your question is INDEXES, though there is obviously more overhead in the A2007 ACCDB format, since I saw an increase in size of only 20MBs, not 30MBs.
One thing I did notice was that I could implement an index that would make the file larger, then delete the index and compact, and it would return to exactly the same file size as it had before, so you should be able to take a single copy of your database and experiment with what removing the indexes does to your file size.

Resources