JBOD on cassandra issue - database

I have 3 node cassandra cluster with JBOD configuration. I have 4 disk for data /data1, /data2, /data3, /data4. Now, we are facing space issue on disk on nodes frequently. Currently /data1 is full (100%) and other disk /data2(26%),/data3(34%), data4(17%) utilised on node 1 and other nodes have sufficient space on all disks.
1)So,my question is if any requests will come on cassandra cluster and data will go to /data1 on node1 what will happen? Is request fail? or cassandra will manage and will write to the other disk which are having space ?
2) JBOD is useful in cassandra except disk disaster recovery?
Thanks in advance!

The data should be distributing evenly. Did you add those disks one at a time or all at once?
You can read up more about how this works on Anthony Grasso's article at http://thelastpickle.com/blog/2018/08/22/the-fine-print-when-using-multiple-data-directories.html
His suggestion and mine is that if you want to use many disks/data dirs, try combining them with LVM or ZFS

Related

ClickHouse - How to remove node from cluster for reading?

Background
I'm beginning work to set up a ClickHouse cluster with 3 CH nodes. The first node (Node A) would be write-only, and the remaining 2 (Nodes B + C) would be read-only. By this I mean that writes for a given table to Node A would automatically replicate to Nodes B + C. When querying the cluster, reads would only be resolved against Nodes B + C.
The purpose for doing this is two-fold.
This datastore serves both real-time and background jobs. Both are high volume, only on the read side, so it makes sense to segment the traffic. Node A would be used for writing to the cluster and all background reads. Nodes B + C would be strictly used for the UX.
The volume of writes is very low, perhaps 1 write per 10,000 reads. Data is entirely refreshed once per week. Background jobs need to be certain that the most current data is being read before they can be kicked off. Reading off of replicas introduces eventual consistency as a concern, so reading from the node directly (rather than the cluster) from Node A guarantees the data to be strongly consistent.
Question
I'm not finding much specific information in the CH documentation, and am wondering whether this might be possible. If so, what would the cluster configuration look like?
Yes, it is possible to do so. But wouldn't the best solution be to read and write to each server sequentially using the Distributed table?

What to do when nodes in a Cassandra cluster reach their limit?

I am studying up Cassandra and in the process of setting up a cluster for a project that I'm working on. Consider this example :
Say I setup a 5 node cluster with 200 gb space for each. That equals up to 1000 gb ( round about 1 TB) of space overall. Assuming that my partitions are equally split across the cluster, I can easily add nodes and achieve linear scalability. However, what if these 5 nodes start approaching the SSD limit of 200 gb? In that case, I can add 5 more nodes and now the partitions would be split across 10 nodes. But the older nodes would still be writing data, as they are part of the cluster. Is there a way to make these 5 older nodes 'read-only'? I want to shoot off random read-queries across the entire cluster, but don't want to write to the older nodes anymore( as they are capped by a 200 gb limit).
Help would be greatly appreciated. Thank you.
Note: I can say that 99% of the queries will be write queries, with 1% or less for reads. The app has to persist click events in Cassandra.
Usually when cluster reach its limit we add new node to cluster. After adding a new node, old cassandra cluster nodes will distribute their data to the new node. And after that we use nodetool cleanup in every node to cleanup the data that distributed to the new node. The entire scenario happens in a single DC.
For example:
Suppose, you have 3 node (A,B,C) in DC1 and 1 node (D) in DC2. Your nodes are reaching their limit. So, decided to add a new node (E) to DC1. Node A, B, C will distribute their data to node E and we'll use nodetool cleanup in A,B,C to cleanup the space.
Problem in understanding the question properly.
I am assuming you know that by adding new 5 nodes, some of the data load would be transferred to new nodes as some token ranges will be assigned to them.
Now, as you know this, if you are concerned that old 5 nodes would not be able to write due to their limit reached, its not going to happen as new nodes have shared the data load and hence these have free space now for further write.
Isolating the read and write to nodes is totally a different problem. But if you want to isolate read to these 5 nodes only and write to new 5 nodes, then the best way to do this is to add new 5 nodes in another datacenter under the same cluster and then use different consistency levels for read and write to satisfy your need to make old datacenter read only.
But the new datacenter will not lighten the data load from first. It will even take the same load to itself. (So you would need more than 5 nodes to accomplish both problems simultaneously. Few nodes to lighten the weight and others to isolate the read-write by creating new datacenter with them. Also the new datacenter should have more then 5 nodes). Best practice is to monitor data load and fixing it before such problem happen, by adding new nodes or increasing data limit.
Considering done that, you will also need to ensure that the nodes you provided for read and write should be from different datacenters.
Consider you have following situation :
dc1(n1, n2, n3, n4, n5)
dc2(n6, n7, n8, n9, n10)
Now, for read you provided with node n1 and for write you provided with node n6
Now the read/write isolation can be done by choosing the right Consistency Levels from bellow options :
LOCAL_QUORUM
or
LOCAL_ONE
These basically would confine the search for the replicas to local datacenter only.
Look at these references for more :
Adding a datacenter to a cluster
and
Consistency Levels

Can we have cassandra only nodes and solr enabled nodes in same datacenter?

I just started with solr and would like your suggestion in below scenario. We have 2 data centers with 3 nodes in each data center(both in different aws regions for location advantage). We have a requirement for which they asked me if we can have 2 solr nodes in each data center. so it will be 2 solr nodes and 1 cassandra only node in each data center. I want to understand if its fine to have this kind of setup and I am little confused whether solr nodes will have data on it along with the indexes? does all 6 nodes share data and 4 solr nodes will have indexes on it along with data? Kindly provide some information on this. Thanks.
Short answer is no, this will not work. If you turn on DSE Search on one node in a DC you need to turn it on for all the nodes in the DC.
But why??
DSE Search builds lucene indexes on the data that is stored local to a node. Say you have a 3 node DC with RF1 (the node only has 1/3rd of the data) and you only turn on search on one of the nodes. 1/3 of your search queries will fail.
So I should just turn search on everywhere?
If you have a relatively small workloads with loose SLA's (both c* and search) and/or if you are over provisioned, you may be fine turning on Search on your main Cassandra nodes. However, in many cases with heavy c* workloads and tight SLA's, Search queries will negatively affect cassandra performance (because they are contending against the same hardware).
I need search nodes in both Physical DC's
If you want search enabled only in two out of your three nodes in a physical DC, the only way to do this is to actually split up your physical DC into two logical DC's. In your case you would have:
US - Cassandra
US - Search
Singapore - Cassandra
Singapore - Search
This gives you geographic locality for your search and c* queries and also provides workload isolation between your c* and search workloads since they contend against different OS Subsystems.

Migrating Riak data when ring size changes

Is it trivial? I will be using Bitcask and file backups (of the files on each node).
Let's say my initial ring size is 256 with 16 nodes. Now if I am required to expand to a ring of 1024, can I setup 16 new instances configured with a ring-size of 1024, copy the backup files for the old cluster into these 16 new instances and start Riak up? Will Riak be able to pick up this old data?
I guess not, since the partition ids and their mapping to individual nodes may also change once the ring size is changed. But what other way is there? Will riak-backup work in this case (when the ring size changes)?
I just want to know that the choice I've made is future-proof enough. Obviously at some point when the requirements change drastically or the user base balloons, the entire architecture might need to be changed. But I do hope to be able to make these sort of changes (to the ring size) at some point - naturally with SOME effort involved, but - without it being impossible.
Migrating clusters to a different ring size is difficult to do with node-based file backups (meaning, if you just back up the /data directories on each node, like it's recommended in Backing Up Riak). Because as you've suspected, the backend data files depend on the mapping of nodes and partitions to a given ring size.
What should you do instead?
You have to use "logical" backups of the entire cluster, using one of these two tools:
riak-admin backup and restore (which does in fact work with
clusters of different ring sizes), or
the Riak Data Migrator
Using either one basically dumps the contents of the entire cluster into one location (so be careful not to run out of disk space, obviously). Which you can then transfer, and restore to your new cluster with a different ring size.
Things to watch out for:
Only do backups of non-live clusters. Meaning, either take the cluster down, or at least make sure no new writes are happening to the old cluster while backup is taking place. Otherwise, if you start backup but new writes are still coming in, there is no guarantee that they'll make it into the backed up data set.
Be sure to transfer the app.config and custom bucket settings to the new cluster before doing backup/restore.
Hopefully this helps. So, it's not trivial (meaning, it'll take a while and will require a lot of disk space, but that's true whenever you're transferring large amounts of data), but it's not extremely complicated either.
I know this is an old question, but with Riak 2.x it is now possible to resize the ring dynamically without shutting down the cluster:
riak-admin cluster resize-ring <new_size>
riak-admin cluster plan
riak-admin cluster commit
Note: The size of a Riak ring should always be a 2n integer, e.g. 16, 32, 64, etc.
http://docs.basho.com/riak/latest/ops/advanced/ring-resizing/

Which NoSQL Database for Mostly Writing

I'm working on a system that will generate and store large amounts of data to disk. A previously developed system at the company used ordinary files to store its data but for several reasons it became very hard to manage.
I believe NoSQL databases are good solutions for us. What we are going to store is generally documents (usually around 100K but occasionally can be much larger or smaller) annotated with some metadata. Query performance is not top priority. The priority is writing in a way that I/O becomes as small a hassle as possible. The rate of data generation is about 1Gbps, but we might be moving on 10Gbps (or even more) in the future.
My other requirement is the availability of a (preferably well documented) C API. I'm currently testing MongoDB. Is this a good choice? If not, what other database system can I use?
The rate of data generation is about 1Gbps,... I'm currently testing MongoDB. Is this a good choice?
OK, so just to clarify, your data rate is ~1 gigaBYTE per 10 seconds. So you are filling a 1TB hard drive every 20 minutes or so?
MongoDB has pretty solid write rates, but it is ideally used in situations with a reasonably low RAM to Data ratio. You want to keep at least primary indexes in memory along with some data.
In my experience, you want about 1GB of RAM for every 5-10GB of Data. Beyond that number, read performance drops off dramatically. Once you get to 1GB of RAM for 100GB of data, even adding new data can be slow as the index stops fitting in RAM.
The big key here is:
What queries are you planning to run and how does MongoDB make running these queries easier?
Your data is very quickly going to occupy enough space that basically every query will just be going to disk. Unless you have a very specific indexing and sharding strategy, you end up just doing disk scans.
Additionally, MongoDB does not support compression. So you will be using lots of disk space.
If not, what other database system can I use?
Have you considered compressed flat files? Or possibly a big data Map/Reduce system like Hadoop (I know Hadoop is written in Java)
If C is key requirement, maybe you want to look at Tokyo/Kyoto Cabinet?
EDIT: more details
MongoDB does not support full-text search. You will have to look to other tools (Sphinx/Solr) for such things.
Larges indices defeat the purpose of using an index.
According to your numbers, you are writing 10M documents / 20 mins or about 30M / hour. Each document needs about 16+ bytes for an index entry. 12 bytes for ObjectID + 4 bytes for pointer into the 2GB file + 1 byte for pointer to file + some amount of padding.
Let's say that every index entry needs about 20 bytes, then your index is growing at 600MB / hour or 14.4GB / day. And that's just the default _id index.
After 4 days, your main index will no longer fit into RAM and your performance will start to drop off dramatically. (this is well-documented under MongoDB)
So it's going to be really important to figure out which queries you want to run.
Have a look at Cassandra. It executes writes are much faster than reads. Probably, that's what you're looking for.

Resources