In the doc [1], it was said that
if using a write consistency level of QUORUM with a replication factor
of 3, Cassandra will send the write to 2 replicas. If the write fails on
one of the replicas but succeeds on the other, Cassandra will report a
write failure to the client.
So assume only 2 replicas receive the update, the write failed. But due to eventually consistency, all the nodes will receive the update finally.
So, should I retry? Or just leave it as it?
Any strategy?
[1] http://www.datastax.com/docs/1.0/dml/about_writes
Those docs aren't quite correct. Regardless of the consistency level (CL), writes are sent to all available replicas. If replicas aren't available, Cassandra won't send a request to the down nodes. If there aren't enough available from the outset to satisfy the CL, an UnavailableException is thrown and no write is attempted to any node.
However, the write can still succeed on some nodes and an error be returned to the client. In the example from [1], if one replica is down before the write was attempted, what is written is true.
So assume only 2 replicas receive the update, the write failed. But
due to eventually consistency, all the nodes will receive the update
finally.
Be careful though: a failed write doesn't tell you how many nodes the write was made to. It could be none so the write may not propagate eventually.
So, should I retry? Or just leave it as it?
In general you should retry, because it may not be written at all. You should only regard your write as written when you got a successful return from the write.
If you're using counters though you should be careful with retries. Because you don't know if the write was made or not, you could get duplicate counts. For counters, you probably don't want to retry (since more often than not the write will have been made to at least one node, at least for higher consistency levels).
Retry will not change much. The problem is that you actually cannot know whether data was persisted at all, because Cassandra throws always the same exception.
You have few options:
enable hints and retry request with cl=any - successful response would mean that at least hint was created. So you know that data is there but not yet accessible.
disable hints and retry with one - successful response would mean that at least node could receive data. In case of error execute delete.
use astyanax and their retry strategy
update to Cassandra 1.2 and use write-ahead log
Related
I am new to cassandra and trying to figure out how cassandra provides consistency in case of failed writes. Consider following scenario where CL is QUORUM, in which case 2 out of 3 replicas must respond. Write request will go to all 3 replica as usual, if write to 2 replica fails, and succeeds on 1 replica, cassandra will return Failed. Since cassandra does not rollback, the record will continue to exist on successful replica. Now, when the read come with CL=QUORUM,the read request will be forwarded to 2 replica node and if one of the replica node is the previously successful one then cassandra will return the new records as it will have latest timestamp. But from client perspective this record was not written at all as cassandra had returned failure during write.
If this is the case then cassandra will never be consistent in this scenario.
How to handle such scenario
Please let me know if this understanding is correct.
Your understanding is correct. The client in this case should receive UnavailableException, but should understand that the write will eventually propagate to the other replicas (if the nodes are alive or come alive), and that this is not a failed write.
For more details see the following articles:
https://www.datastax.com/blog/2014/10/cassandra-error-handling-done-right
http://mighty-titan.blogspot.com/2012/06/understanding-cassandras-consistency.html
I have an application where I need to store some data in a database (mysql for instance) and then publish some data in a message queue. My problem is: If the application crashes after the storage in the database, my data will never be written in the message queue and then be lost (thus eventual consistency of my system will not be guaranted).
How can I solve this problem ?
I have an application where I need to store some data in a database (mysql for instance) and then publish some data in a message queue. My problem is: If the application crashes after the storage in the database, my data will never be written in the message queue and then be lost (thus eventual consistency of my system will not be guaranted). How can I solve this problem ?
In this particular case, the answer is to load the queue data from the database.
That is, you write the messages that need to be queued to the database, in the same transaction that you use to write the data. Then, asynchronously, you read that data from the database, and write it to the queue.
See Reliable Messaging without Distributed Transactions, by Udi Dahan.
If the application crashes, recovery is simple -- during restart, you query the database for all unacknowledged messages, and send them again.
Note that this design really expects the consumers of the messages to be designed for at least once delivery.
I am assuming that you have a loss-less message queue, where once you get a confirmation for writing data, the queue is guaranteed to have the record.
Basically, you need a loop with a transaction that can roll back or a status in the database. The pseudo code for a transaction is:
Begin transaction
Insert into database
Write to message queue
When message queue confirms, commit transaction
Personally, I would probably do this with a status:
Insert into database with a status of "pending" (or something like that)
Write to message queue
When message confirms, change status to "committed" (or something like that)
In the case of recovery from failure, you may need to check the message queue to see if any "pending" records were actually written to the queue.
I'm afraid that answers (VoiceOfUnreason, Udi Dahan) just sweep the problem under the carpet. The problem under carpet is: How the movement of data from database to queue should be designed so that the message will be posted just once (without XA). If you solve this, then you can easily extend that concept by any additional business logic.
CAP theorem tells you the limits clearly.
XA transactions is not 100% bullet proof solution, but seems to me best of all others that I have seen.
Adding to what #Gordon Linoff said, assuming durable messaging (something like MSMQ?) the method/handler is going to be transactional, so if it's all successful, the message will be written to the queue and the data to your view model, if it fails, all will fail...
To mitigate the ID issue you will need to use GUIDs instead of DB generated keys (if you are using messaging you will need to remove your referential integrity anyway and introduce GUIDS as keys).
One more suggestion, don't update the database, but inset only/upsert (the pending row and then the completed row) and have the reader do the projection of the data based on the latest row (for example)
Writing message as part of transaction is a good idea but it has multiple drawbacks like
If your
a. database/language does not support transaction
b. transaction are time taking operation
c. you can not afford to wait for queue response while responding to your service call.
d. If your database is already under stress, writing message will exacerbate the impact of higher workload.
the best practice is to use Database Streams. Most of the modern databases support streams(Dynamodb, mongodb, orcale etc.). You have consumer of database stream running which reads from database stream and write to queue or invalidate cache, add to search indexer etc. Once all of them are successful you mark the stream item as processed.
Pros of this approach
it will work in the case of multi-region deployment where there is a regional failure. (you should read from regional stream and hydrate all the regional data stores.)
No Overhead of writing more records or performance bottle necks of queues.
You can use this pattern for other data sources as well like caching, queuing, searching.
Cons
You may need to call multiple services to construct appropriate message.
One database stream might not be sufficient to construct appropriate message.
ensure the reliability of your streams, like redis stream is not reliable
NOTE this approach also does not guarantee exactly once semantics. The consumer logic should be idempotent and should be able to handle duplicate message
I am looking for something to use as a simple service registry and am considering etcd. For this use-case availability is more important than consistency. Clients must be able to read/write keys to any of the nodes even when the cluster is split. Can etcd be used in this way? It doesn't matter if some of the writes are lost when things come back together as they will be quickly updated by service "I am alive" heartbeat timers.
I'm also new to etcd. What I have noticed is when network partitioning happens, reads still work for the nodes which are not in main quorum. They will see inconsistent data.
As for the writes they fail with "Raft internal error"
I was studying up on Cassandra and i understand that it is a peer database where there are no master or slaves.
Each read/write is facilitated by a coordinator node, who then forwards the read/write request to the specific node by using the replication strategy and Snitch.
My question is around the performance problems with this method.
Isn't there an extra hop?
Is the write buffered and then forwarded to the right replicas?
How does the performance change with different replication
strategies?
Can I improve the performance by bypassing the coordinator node and
writing to the replica nodes myself?
1) There will occasionally be an extra hop but your driver will most likely have a TokenAware Strategy for selecting the coordinator which will choose the coordinator to be a replica for the given partition.
2) The write is buffered and depending on your consistency level you will not receive acknowledgment of the write until it has been accepted on multiple nodes. For example with Consistency Level one you will receive an ACK as soon as the write as been accepted by a single node. The other nodes will have writes queued up and delivered but you will not receive any info about them. In the case that one of those writes fails/cannot be delivered, a hint will be stored on the coordinator to be delivered when the replica comes back online. Obviously there is a limit to the number of hints that can be saved so after long downtimes you should run repair.
With higher consistency levels the client will not receive an acknowledgment until the number of nodes in the CL have accepted the write.
3) The performance should scale with the total number of writes. If a cluster can sustain a net 10k writes per second but has RF = 2. You most likely can only do 5k writes per second since every write is actually 2. This will happen irregardless of your consistency level since those writes are sent even though you aren't waiting for their acknowledgment.
4) There is really no way to get around the coordination. The token aware strategy will pick a good coordinator which is basically the best you can do. If you manually attempted to write to each replica, your write would still be replicated by each node which received the request so instead of one coordination event you would get N. This is also most likely a bad idea since I would assume you have a better network between your C* nodes than from your client to the c* nodes.
I don't have answers for 2 and 3, but as for 1 and 4.
1) Yes, this can cause an extra hop
4) Yes, well kind of. The Datastax driver, as well as the Netflix Astynax driver can be set to be Token Aware which means it will listen to the ring's gossip to know which nodes have which token ranges and send the insert to the coordinator on the node it will be stored on. Eliminating the additional network hop.
To add to Andrew's response, don't assume the coordinator hop is going to cause significant latency. Do your queries and measure. Think about consistency levels more than the extra hop. Tune your consistency for higher read or higher write speed, or a balance of the two. Then MEASURE. If you find latencies to be unnacceptable, you may then need to tweak your consistency levels and / or change your data model.
I know that when Solr performs optimization, either explicitly by the optimize command, or implicitly by Lucene due to the mergeFactor, readers are not blocked. That is, the server is still available for searching
Is it also available for updates? Can other threads in my application send documents updates to solr, and possibly also send commits? Will those updates pass through into the index, or will they be blocked?
An old question though, however, some more info can help here.
optimize command in solr is a call to IndexWriter's forceMerge() method. This method does take a lock on the IndexWriter instance itself. However, the point is that adding documents does not require any lock on the IW instance, neither does it need any commitLock or fullFlushLock.
Moreover, even with forceMerge(), it is the ConcurrentMergeScheduler which picks up the merge process and does it in a different thread altogether.
Usually merge process (Not the forceMerge, which is not recommended anyway) needs to lock the IndexWriter instance only while preparing the merge info, when it needs to know what segments to take for merge, and what is the new merged segment name etc. Once it has this information, merge happens concurrently.
So, yes, you can keep adding documents even when optimize is in process - they will get buffered in RAM until the next commit/optimize or close() of IndexWriter.
Having said that, might as well add that you can not have concurrent commits to different segments - that is Lucene will do only one commit at a time. Adding documents does not flush them to any segment at all - just puts them in buffer.
The answer is "Yes". The server will be respond to search requests, but updated documents will not show up in the search results until you send a commit command. The updated documents will stack up and be committed whenever a client/thread issues a commit command to the server. If you have multiple clients/threads issuing updates and commits they will not block each other, and the updates will show as soon as the commit command completes.