Can we model the relationship between (heap size vs latency) and (heap size vs throughput) using universal law of scalability? - theory

I have a data set of the form (heap size vs latency) and (heap size vs throughout).
As I have read in several articles, universal law of scalability models throughput against concurrency (or number of nodes). Can I use the universal scalability law equation to find the relationship between (heap size vs latency) and (heap size vs throughput)?
Thanks

No you can't.
Universal Scalability Law (USL) has two variations; USL for hardware and USL for software
In the USL for hardware, you get the relationship between the number of nodes and throughput (keep the concurrency per node constant).
In the USL for software, you get the relationship between the level of concurrency and throughput (keep the node count constant)
Also, using Little's Law, you can derive the relationships for average latency as well.

Related

What is the Likelihood of Multiple Compactions Overlapping in ScyllaDB?

In the open source version, Scylla recommends keeping up to 50% of disk space free for “compactions”. At the same time, the documentation states that each table is compacted independently of each other. Logically, this suggests that in a applications with dozens (or even multiple) tables there’s only a small chance that so many compaction will coincide.
Is there a mathematical model of calculating how multiple compaction might overlap in an application with several tables? Based on a cursory analysis, it seems that the likelihood of multiple overlapping compaction is small, especially when we are dealing with dozens of independent tables.
You're absolutely right:
With the size-tiered compaction strategy a compaction may temporarily double the disk requirements. But it doesn't double the entire disk requirements but only of the sstables involved in this compaction (see also my blog post on size-tiered compaction and its space amplification). There is indeed a difference between "the entire disk usage" and just "the sstables involved in this compaction" for two reasons:
As you noted in your question, if you have 10 tables of similar size, compacting just one of them will work on just 10% of the data, so the temporary disk usage during compaction might be 10% of the disk usage, not 100%.
Additionally, Scylla is sharded, meaning that different CPUs handle their sstables, and compactions, completely independently. If you have 8 CPUs on your machines, each CPU only handles 1/8th of the data, so when it does compaction, the maximum temporary overhead will be 1/8th of the table's size - not the full table size.
The second reason cannot be counted on - since shards choose when to compact independently, if you're unlucky all shards may decide to compact the same table at exactly the same time, and worse - may happen to do the biggest compactions all at the same time. This "unluckiness" can also happen at 100% probability if you start a "major compaction" (nodetool compact).
The first reason, the one which you asked about, is indeed more useful and reliable: Beyond it being unlikely that all shards will choose to compact all sstables are exactly the same time, there is an important detail in Scylla's compaction algorithm which helps here: Each shard only does one compaction of a (roughly) given size at a time. So if you have many roughly-equal-sized tables, no shard can be doing full compaction of more than one of those tables at a time. This is guaranteed - it's not a matter of probability.
Of course, this "trick" only helps if you really have many roughly-equal-sized tables. If one table is much bigger than the rest, or tables have very different sizes, it won't help you too much to control the maximum temporary disk use.
In issue https://github.com/scylladb/scylla/issues/2871 I proposed a idea of how Scylla can guarantee that when disk space is low, the sharding (point 1) is also used to reduce temporary disk space usage. We haven't implemented this idea, but instead implemented a better idea - "incremental compaction strategy", which does huge compactions in pieces ("incrementally") to avoid most of the temporary disk usage. See this blog post for how this new compaction strategy works, and graphs demonstrating how it lowers the temporary disk usage. Note that Incremental Compaction Strategy is currently part of the Scylla Enterprise version (it's not in the open-source version).

What is CPBTree in SAP HANA?

I'm studying SAP HANA main memory database.
There is index called CPBTree in it. In it's document, it is described as follows:
CPB+-tree stands for Compressed Prefix B+-Tree; this index tree type
is based on pkB-tree. CPB+-tree is a very small index because it uses
'partial key' that is only part of full key in index nodes.
This is a bit vague. There is no other explanation about CPBTree structure on the Internet.
Is there anyone who can explain more or introduce a good document?
Where to begin here?
B-trees are very intensely studied and developed data structures, so pointing to a single document that explains all aspects relevant to this question and SAP HANA is a bit difficult.
Maybe it helps to unpack the term first:
Compressed Prefix
This basically means, the B-tree index and leaf nodes do not contain the full strings for keys. Instead, the parts of the key-strings that are common among the keys (the prefixes) are stored separately. The leaf and index nodes then only contain
the pointer to the prefix
a sort of "delta" that contains the remaining key (this is where the partial key from the pkB-tree comes in)
and a pointer to the data record (row id)
This technique is rather common in many DBMS, usually attached to a feature called "index compression" or something similar.
So, now we know that HANA uses compressed B-tree indexes (for row-store tables and for data that can be expressed as strings).
Why is this important for an in-memory database like HANA?
In short: memory transfer effort between RAM and CPU.
The smaller the index structure, the more of it can fit into the CPU caches. To traverse (go through) the index, fewer back-and-forth movements of data have to be performed.
It's a huge performance advantage.
This is complemented with specific "cache-conscious" index protocols (how the index structure is used by the HANA kernel) that try to minimize the RAM-CPU data transfers.
All this is an overly simplified explanation and I hope that it helps to make more sense nevertheless.
If you want to "dive deeper" and start reading academic papers around that topic then Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems by Prof. Sang K. Cha. et al.
This is the same Sang K. Cha that created P*Time, an in-memory (row-store) DBMS in the early 2000s.
This P*Time has been, rather well-known, acquired by SAP (like so many other DBMS software products companies... Sybase... MaxDB... OrientDB...) and the technology has been used as a research base for what would become SAP HANA.
Nowadays, there is only a small part of P*Time still in SAP HANA and it is mostly reduced to the concepts and algorithms and not so much expressed in actual P*Time code.
All in all, for the user of HANA (developer, admin, data consumer) the specifics of this index implementation hardly matter as none of them can interact with the index structure directly.
What matters is that this index takes modern server systems (many cores, large CPU caches, lots of RAM) and extracts great performance from them, while still allowing for "high-speed" transactions.
I added an extended write-up of this answer to my blog: https://lbreddemann.org/what-is-cpb-tree-in-sap-hana/.

Database Shard count recommendation (power of 2)

I was reading this post and author advises to have shard count as a power of two.
Benefit we get from it? Why cant it be a simple number like 500, 150 or 1000?
A typical growth pattern for distributed data systems is to double the cluster size when needed. This allows for more even rebalancing of data and minimizes the effect of any hotspots.
Here is an in-depth discussion on database sharding that you may find useful. (Disclosure: dbShards is one of my company's products)

How do I know how many partitions a DynamoDB table is spread over?

Amazon's DynamoDB in designed for guaranteed performances. A customer must provision throughput for each of it's tables.
To achieve this performances, tables are transparently spread over multiple "servers" AKA "partitions".
Amazon provides us with a "best practice" guide for dimensioning and optimizing the throughput. In this guide, we are told that the provisioned throughput is evenly divided over the partitions. In other words, If the requests are not evenly distributed over the partitions, only a fraction of the reserved (and paid) throughput will be available to the application.
In the worst case scenario, it will be:
worst_throughput = provisioned_and_paid_throughput / partitions
To estimate this "worst_throughput", I need to know the total number of partitions. Where can I find it or how do I estimate it ?
It says, "When storing data, Amazon DynamoDB divides a table's items into multiple partitions, and distributes the data primarily based on the hash key element."
What you really want to know is the throughput of a single partition. It seems like you can test that by hammering a single key.
See this page: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.Partitions
Which has some simple calculations you can carry out based on the amount of read and write capacity you provision. Note that this is only for initial capacity. As your usage of dynamodb continues, these calculations will have less and less relevance.
A single partition can hold approximately 10 GB of data, and can support a maximum of 3,000 read capacity units or 1,000 write capacity units.

Databases: More questions about (B-Tree) indexes

I've been studying indexes and there are some questions that pother me and which I think important.
If you can help or refer to sources, please feel free to do it.
Q1: B-tree indexes can favor a fast access to specific rows on a table. Considering an OLTP system, with many accesses, both Read and Write, simultaneously, do you think it can be a disadvantage having many B-tree indexes on this system? Why?
Q2: Why are B-Tree indexes not fully occupied (typically only 75% occupied, if I'm not mistaken)?
Q1: I've no administration experience with large indexing systems in practice, but the typical multiprocessing environment drawbacks apply to having multiple B-tree indexes on a system -- cost of context switching, cache invalidation and flushing, poor IO scheduling, and the list goes up. On the other hand, IO is something that inherently ought to be non-blocking for maximal use of resources, and it's hard to do that without some sort of concurrency, even if done in a cooperative manner. (For example, some people recommend event-based systems.) Also, you're going to need multiple index structures for many practical applications, especially if you're looking at OLTP. The biggest thing here is good IO scheduling, access patterns, and data caching depending on said access patterns.
Q2: Because splitting and re-balancing nodes is expensive. The naive methodology for speed is "only split with they're full." Given this, there's two extremes -- a node was just split and is half full, or a node is full so it will be next time. The 'average' between the cases (50% and 100%) is 75%. Yes, it's somewhat bad logic from a mathematics perspective, but it exposes the underlying reason as to why the 75% figure appears.

Resources