Time Required to add node in Cassandra cluster? - database

We have cassandra cluster of 20 nodes with 30TB. We want to increase the cluster size to 130TB. For us, it takes around 1 day to add a new node. So I want to understand if it is possible to reduce this time?
How to increase the cassandra cluster capacity for that much data? And how much time will it take?

Related

What decides the number of partitions in a DynamoDB table?

I'm a beginner to DynamoDB, and my online constructor doesn't answer his Q/A lol, and i've been confused about this.
I know that the partition key decides the partition in which the item will be placed.
I also know that the number of partitions is calculated based on throughput or storage using the famous formulas
So let's say a table has user_id as its partition Key, with 200 user_ids. Does that automatically mean that we have 200 partitions? If so, why didn't we calculate the no. of partitions based on the famous formulas?
Thanks
Let's establish 2 things.
A DynamoDB partition can support 3000 read operations and 1000 write operations. It keeps a divider between read and write ops so they do not interfere with each other. If you had a table that was configured to support 18000 reads and 6000 writes, you'd have at least 12 partition, but probably a few more for some head room.
A provisioned capacity table has 1 partition by default, but an on-demand partition has 4 partitions by default.
So, to answer your question directly. Just because you have 200 items, does not mean you have 200 partitions. It is very possible for those 200 items to be in just one partition if your table was in provisioned capacity mode. If the configuration of the table changes or it takes on more traffic, those items might move around to new partitions.
There are a few distinct times where DynamoDB will add partitions.
When partitions grow in storage size larger than 10GB. DynamoDB might see that you are taking on data and try to do this proactively, but 10GB is the cut off.
When your table needs to support more operations per second that it is currently doing. This can happen manually because you configured your table to support 20,000 reads/sec where before I only supported 2000. DynamoDB would have to add partitions and move data to be able to handle that 20,000 reads/sec. Or is can happen automatically to add partitions because you configured floor and ceiling values in DynamoDB auto-scaling and DynamoDB senses your ops/sec is climbing and will therefore adjust the number of partitions in response to capacity exceptions.
Your table is in on-demand capacity mode and DynamoDB attempts to automatically keep 2x your previous high water mark of capacity. For example, say your table just reached 10,000 RCU for the first time. DynamoDB would see that is past your previous high water mark and start adding more partitions as it tries to keep 2x the capacity at the ready in case you peak up again like you just did.
DynamoDB is actively monitoring your table and if it sees one or more items are particularly being hit hard (hot keys), are in the same partition and this might create a hot partition. If that is happening, DynamoDB might split the table to help isolate those items and prevent or fix a hot partition situation.
There are one or two other more rare edge cases, but you'd likely be talking to AWS Support if you encountered this.
Note: Once DynamoDB creates partitions, the number of partitions never shrinks and this is ok. Throughput dilution is no longer a thing in DynamoDB.
The partition key value is hashed to determine the actual partition to place the data item into.
Thus the number of distinct partition key values has zero affect on the number of physical partitions.
The only things that affect the physical number of partitions are RCUs/WCUs (throughput) and the amount of data stored.
Nbr Partions Pt = RCU/3000 + WCU/1000
Nbr Partions Ps = GB/10
Unless one of the above is more than 1.0, there will likely only be a single partition. But I'm sure the split happens as you approach the limits, when exactly is something only AWS knows.

JBOD on cassandra issue

I have 3 node cassandra cluster with JBOD configuration. I have 4 disk for data /data1, /data2, /data3, /data4. Now, we are facing space issue on disk on nodes frequently. Currently /data1 is full (100%) and other disk /data2(26%),/data3(34%), data4(17%) utilised on node 1 and other nodes have sufficient space on all disks.
1)So,my question is if any requests will come on cassandra cluster and data will go to /data1 on node1 what will happen? Is request fail? or cassandra will manage and will write to the other disk which are having space ?
2) JBOD is useful in cassandra except disk disaster recovery?
Thanks in advance!
The data should be distributing evenly. Did you add those disks one at a time or all at once?
You can read up more about how this works on Anthony Grasso's article at http://thelastpickle.com/blog/2018/08/22/the-fine-print-when-using-multiple-data-directories.html
His suggestion and mine is that if you want to use many disks/data dirs, try combining them with LVM or ZFS

What to do when nodes in a Cassandra cluster reach their limit?

I am studying up Cassandra and in the process of setting up a cluster for a project that I'm working on. Consider this example :
Say I setup a 5 node cluster with 200 gb space for each. That equals up to 1000 gb ( round about 1 TB) of space overall. Assuming that my partitions are equally split across the cluster, I can easily add nodes and achieve linear scalability. However, what if these 5 nodes start approaching the SSD limit of 200 gb? In that case, I can add 5 more nodes and now the partitions would be split across 10 nodes. But the older nodes would still be writing data, as they are part of the cluster. Is there a way to make these 5 older nodes 'read-only'? I want to shoot off random read-queries across the entire cluster, but don't want to write to the older nodes anymore( as they are capped by a 200 gb limit).
Help would be greatly appreciated. Thank you.
Note: I can say that 99% of the queries will be write queries, with 1% or less for reads. The app has to persist click events in Cassandra.
Usually when cluster reach its limit we add new node to cluster. After adding a new node, old cassandra cluster nodes will distribute their data to the new node. And after that we use nodetool cleanup in every node to cleanup the data that distributed to the new node. The entire scenario happens in a single DC.
For example:
Suppose, you have 3 node (A,B,C) in DC1 and 1 node (D) in DC2. Your nodes are reaching their limit. So, decided to add a new node (E) to DC1. Node A, B, C will distribute their data to node E and we'll use nodetool cleanup in A,B,C to cleanup the space.
Problem in understanding the question properly.
I am assuming you know that by adding new 5 nodes, some of the data load would be transferred to new nodes as some token ranges will be assigned to them.
Now, as you know this, if you are concerned that old 5 nodes would not be able to write due to their limit reached, its not going to happen as new nodes have shared the data load and hence these have free space now for further write.
Isolating the read and write to nodes is totally a different problem. But if you want to isolate read to these 5 nodes only and write to new 5 nodes, then the best way to do this is to add new 5 nodes in another datacenter under the same cluster and then use different consistency levels for read and write to satisfy your need to make old datacenter read only.
But the new datacenter will not lighten the data load from first. It will even take the same load to itself. (So you would need more than 5 nodes to accomplish both problems simultaneously. Few nodes to lighten the weight and others to isolate the read-write by creating new datacenter with them. Also the new datacenter should have more then 5 nodes). Best practice is to monitor data load and fixing it before such problem happen, by adding new nodes or increasing data limit.
Considering done that, you will also need to ensure that the nodes you provided for read and write should be from different datacenters.
Consider you have following situation :
dc1(n1, n2, n3, n4, n5)
dc2(n6, n7, n8, n9, n10)
Now, for read you provided with node n1 and for write you provided with node n6
Now the read/write isolation can be done by choosing the right Consistency Levels from bellow options :
LOCAL_QUORUM
or
LOCAL_ONE
These basically would confine the search for the replicas to local datacenter only.
Look at these references for more :
Adding a datacenter to a cluster
and
Consistency Levels

DSE SOLR OOMing

We have had a 3 node DSE SOLR cluster running and recently added a new core. After about a week of running fine, all of the SOLR nodes are now OOMing. The fill up both the JVM Heap (set at 8GB) and the system memory. Then are also constantly flushing the memtables to disk.
The cluster is DSE 3.2.5 with RF=3
here is the solrconfig from the new core:
http://pastie.org/8973780
How big is your Solr index relative to the amount of system memory available for the OS to cache file system pages. Basically, your Solr index needs to fit in the OS file system cache (the amount of system memory available after DSE is started but has not yet processed any significant amount of data.)
Also, how many Solr documents (Cassandra rows) and how many fields (Cassandra columns) are populated on each node? There is no hard limit, but 40 to 100 million is a good guideline as an upper limit - per node.
And, how much system memory and how much JVM heap is available if you restart DSE, but before you start putting load on the server?
For RF=N, where N is the total number of nodes in the cluster or at least the search data center, all of the data will be stored on all nodes, which is okay for smaller datasets, but not okay for larger datasets.
For RF=n, this means that each node will have X/N*n rows or documents, where X is the total number of rows or documents all column families in the data center. X/N*n is the number that you should try to keep below 100 million. That's not a hard limit - some datasets and hardware might be able to handle substantially more, and some datasets and hardware might not even be able to hold that much. You'll have to discover the number that works best for your own app, but the 40 million to 100 million range is a good start.
In short, the safest estimate is for X/N*n to be kept under 40 million for Solr nodes. 100 may be fine for some data sets and beefier hardware.
As far as tuning, one common source of using lots of heap is heavy use of Solr facets and filter queries.
One technique is to use "DocValues" fields for facets since DocValues can be stored off-heap.
Filter queries can be marked as cache=false to save heap memory.
Also, the various Solr caches can be reduced in size or even set to zero. That's in solrconfig.xml.

Low cost way to host a large table yet keep the performance scalable?

I have a growing table storing time series data, 500M entries now, and 200K new records every day. The total size is around 15GB for now.
My clients are querying the table via a PHP script mostly, and the size of the result set is around 10K records (not very large).
select * from T where timestamp > X and timestamp < Y and additionFilters
And I want this operation cheap.
Currently my table is hosting in Postgres 7, on a single 16G memory Box, and I would love to see some good suggestion for me to host this in low cost and also allow me to scale up for performance if needed.
The table serves:
1. Query: 90%
2. Insert: 9.9%
2. Update: 0.1% <-- very rare.
PostgreSQL 9.2 supports partitioning and partial indexes. If there are a few hot partitions, and you can put those partitions or their indexes on a solid state disk, you should be able to run rings around your current configuration.
There may or may not be a low cost, scalable option. It depends on what low cost and scalable mean to you.

Resources