How to prevent that a lease is used twice in a distributed systems

How to prevent that a lease is used twice in a distributed systems - distributed

I was wondering if it is possible to prevent that a lease is used twice.
What is a lease?
A lease is some sort of token that is given out to exactly one node at a time in a distributed setting. An example use case of a lease would be that only one node at the time can write to a database.
What is a fencing token and why use it?
With the lease as described above, we could run into problems. A node could stop (e.g. because of stop-the-world garbage collection) and still believe that his lease is valid, when in fact it is not. If we give each lease a fencing token (an incremental integer) the database could reject leases that uses old fencing tokens.
My problem:
Image taken from Designing Data-Intensive Applications, p. 303
Here client 1 gets rejected, because he is using an old fencing token. But this is ONLY because client 2 used his token before client 1 and the store is aware that the current fencing token was incremented. How can we make sure that client 1 would be rejected, when using an outdated lease, even if he tries to write before client 2 tries to write?

Related

max attempts of 10 was reached for request with last error being: RECEIPT_NOT_FOUND

I've upgraded my sdk to v2.20.0-beta.4, and I am receiving this error when submitting transactions.
I expect the transactions to succeed. They will if I downgrade the sdk to a stable release, so I am guessing this is a bug in beta which happens, but Im more interested in what it means

It means the sdk tried to get a receipt for your transaction 10 times and eventually gave up trying, this could be for a number of reasons
The transactionId you're asking a receipt for doesn't actually exist
The node the receipt query goes to has never seen that transaction
The communication between you and the node you're asking the receipt from is broken
You're asking for a receipt for a transaction that's more than 3 minutes old
Given earlier versions of the SDK work fine, it's probably a bug, I'd encourage you to file an issue on the SDK with your findings.
While asking for a receipt is usually successful (even if the transaction failed), there are edge cases where a successful transaction will not be followed by a successful receipt request (for the reasons above). Those are not necessarily Hedera's fault, you could send a tx from a mobile device, loose network connectivity and then fail to fetch a receipt when you regain connectivity.
The belts and braces approach is to log the transaction ids in a persisted list, remove from the list when a receipt is obtained and in the event a receipt cannot be obtained, check with a mirror node whether any transactions in the list have succeeded (or not).
If a transaction in the list is more than 3 minutes old and there is no record of it on a mirror node, it hasn't and will never be processed.

Very long camel redelivery policy

I am using Camel and I have a business problem. We consume order messages from an activemq queue. The first thing we do is check in our DB to see if the customer exists. If the customer doesn't exist then a support team needs to populate the customer in a different system. Sometimes this can take a 10 hours or even the following day.
My question is how to handle this. It seems to me at a high level I can dequeue these messages, store them in our DB and re-run them at intervals (a custom coded solution) or I could note the error in our DB and then return them back to the activemq queue with a long redelivery policy and expiration, say redeliver every 2 hours for 48 hours.
This would save a lot of code but my question is if approach 2 is a sound approach or could lead to resource issues or problems with not knowing where messages are?

This is a pretty common scenario. If you want insight into how the jobs are progressing, then it's best to use a database for this.
Your queue consumption should be really simple: consume the message, check if the customer exists; if so process, otherwise write a record in a TODO table.
Set up a separate route to run on a timer - every X minutes. It should pull out the TODO records, and for each record check if the customer exists; if so process, otherwise update the record with the current timestamp (the last time the record was retried).
This allows you to have a clear view of the state of the system, that you can then integrate into a console to see what the state of the outstanding jobs is.
There are a couple of downsides with your Option 2:
you're relying on the ActiveMQ scheduler, which uses a KahaDB variant sitting alongside your regular store, and may not be compatible with your H/A setup (you need a shared file system)
you can't see the messages themselves without scanning through the queue, which is an antipattern - using a queue as a database - you may as well use a database, especially if you can anticipate needing to ever selectively remove a particular message.

Perform reading from a paxos-based distributed cluster

Could any one help introduce how to read contents from the distributed cluster?
I mean there is a distributed cluster who's consistency is guaranteed by Paxos algorithm.
In real-world application, how does the client read the contents they have written to the cluster?
For example, in a 5 servers cluster, maybe only 3 of them get the newest data and the other 2 have old data due to network delay or something.
Does this means the client needs to read at least majority of all nodes? In 5-servers, it means reading data from at least 3 servers and checked the one with newest version number?
If so, it seems quite slow since you need to read 3 copies? How does the real world implement this ?

Clients should read from the leader. If a node knows it is not the leader it should redirect the client to the leader. If a node does not know who is leader it should throw an error and the client should pick another node at random until it is told or finds the leader. If the node thinks it is the leader it is dangerous to return a read from local state as it may have just lost connectivity to the rest of the cluster right when it gets a massive stall (cpu load, io stall, vm overload, large gc, some background task, server maintenance job, ...) such that it actually looses the leadership during replying to the client and gives out a stale read. This can be avoided by running a round of (multi)Paxos for the read.
Lamport Clocks and Vector Clock say you must pass messages to assign that operation A happens before operation B when they run on different machines. If not they run concurrently. This provides the theoretic underpinning as to why we cannot say a read from a leader is not stale without exchanging messages with the majority of the cluster. The message exchange establishes a "happened-before" relationship of the read to the next write (which may happen on a new leader due to a failover).
The leader itself can be an acceptor and so in a three node cluster it just needs one response from one other node to complete a round of (multi)Paxos. It can send messages in parallel and reply to the client when it gets the first response. The network between nodes should be dedicated to intra-cluster traffic (and the best you can get) such that this does not add much latency to the client.
There is an answer which describes how Paoxs can be used for a locking service which cannot tolerate stale reads or reordered writes where a crash scenario is discussed over at some questions about paxos Clearly a locking service cannot have reads and writes to the locks "running concurrently" hence why it does a round of (multi)Paxos for each client message to strictly order reads and writes across the cluster.

What are the common practice to handle Cassandra write failure?

In the doc [1], it was said that
if using a write consistency level of QUORUM with a replication factor
of 3, Cassandra will send the write to 2 replicas. If the write fails on
one of the replicas but succeeds on the other, Cassandra will report a
write failure to the client.
So assume only 2 replicas receive the update, the write failed. But due to eventually consistency, all the nodes will receive the update finally.
So, should I retry? Or just leave it as it?
Any strategy?
[1] http://www.datastax.com/docs/1.0/dml/about_writes

Those docs aren't quite correct. Regardless of the consistency level (CL), writes are sent to all available replicas. If replicas aren't available, Cassandra won't send a request to the down nodes. If there aren't enough available from the outset to satisfy the CL, an UnavailableException is thrown and no write is attempted to any node.
However, the write can still succeed on some nodes and an error be returned to the client. In the example from [1], if one replica is down before the write was attempted, what is written is true.
So assume only 2 replicas receive the update, the write failed. But
due to eventually consistency, all the nodes will receive the update
finally.
Be careful though: a failed write doesn't tell you how many nodes the write was made to. It could be none so the write may not propagate eventually.
So, should I retry? Or just leave it as it?
In general you should retry, because it may not be written at all. You should only regard your write as written when you got a successful return from the write.
If you're using counters though you should be careful with retries. Because you don't know if the write was made or not, you could get duplicate counts. For counters, you probably don't want to retry (since more often than not the write will have been made to at least one node, at least for higher consistency levels).

Retry will not change much. The problem is that you actually cannot know whether data was persisted at all, because Cassandra throws always the same exception.
You have few options:
enable hints and retry request with cl=any - successful response would mean that at least hint was created. So you know that data is there but not yet accessible.
disable hints and retry with one - successful response would mean that at least node could receive data. In case of error execute delete.
use astyanax and their retry strategy
update to Cassandra 1.2 and use write-ahead log

Timing user tasks with seconds precision

I'm building a website where I need to time users' tasks, show them the time as it elapses and keep track of how long it took them to complete the task. The timer should be precise by the second, and an entire task should take about 3-4 hrs top.
I should also prevent the user from forging the completion time (there is no money involved, so it's not really high-risk, but there is some risk).
Currently I use a Timestamp to keep track of when the user began, and at the same time, initialize a JS based timer, when the user finishes I get a notice, and I calculate the difference between current time and the beginning timestamp - this approach is no good, there is a few seconds difference between the user's timer and my time difference (i.e. the time I calculated it took the user to complete the task, note: this was only tested at my dev env., since I don't have any other env. yet..).
Two other approaches I considered are:
1. Relying entirely on client side timer (i.e. JS), and when the user completes the task - sending the time it took him encrypted (this way the user can't forge a start time). This doesn't seem very practical, since I can't figure out a way to generate a secret key at client side which will really be "secret".
2. Relying entirely on server side timer, and sending "ticks" every second. This seem like a lot of server side work comparing to the other two methods(machine, not human.. e.g. accessing the DB for every "tick" to get start time), and I'm also not sure it will be completely accurate.
EDIT:
Here's what's happening now in algorithm wording:
User starts task - server sends user a task id and records start time at db, client side timer is initialized.
User does task, his timer is running...
User ends task, timer is stopped and user's answer and task id are sent to the server.
Server retrieves start time (using received task id) and calculates how long it took user to complete task.
Problem - the time as calculated by server, and the time as displayed at client side are different.
Any insight will be much appreciated.

If I've understood correctly the problem is that the server and client times are slightly different, which they always will be.
So I'd slightly tweak your original sequence as follows:
User starts task - server sends user a task id and records start
time at db, client side timer is initialized.
User client notifies server of client start time; recorded in DB
alongside Server Start Time
User does task, his timer is running...
User ends task, timer is stopped and user's elapsed time, answer
and task id are sent to the server.
Upon receipt the server notes the incoming request time, retrieves
start time calculates how long it took user to complete task for
both server time (start/finish) and client times.
Server ensures that the client value is within an acceptable range
of the server verified time and uses the client time. If the client
time is not within acceptable range (e.g. 30seconds) then use the server times as the
figure.
There will be slight differences in time due to latency, server load, etc. so by using the client values it will be more accurate and just as secure, because these values are sanity checked.
To answer the comment:
You can only have one sort of accuracy, either accurate in terms of what the client/user sees, or accurate in terms of what the server knows. Anything coming from the client side could be tainted, so there has to be a compromise somewhere. You can minimise this by measurement and offsets, such that the end difference is within the same range as the start difference, using the server time, but it will never be 100% unchangeable. If it's really that much of an issue then store times with less accuracy.
If you really must have accuracy and reliability then the only way is to use the server time and periodically grab it via ajax for display and use a local timer to fill in the gaps with a sliding adjustment algorithm between actual and reported times.

I think this will work. Seems like you've got a synchronization issue and also a cryptography issue. My suggestion is to work around the synchronization issue in a way invisible to the user, while still preserving security.
Idea: Compute and display the ticks client side, but use cryptographic techniques to prevent the user from sending a forged time. As long as the user's reported time is close to the server's measured time, just use the user's time. Otherwise, claim forgery.
Client asks server for a task.
Server gets the current timestamp, and encrypts it with its own public key. This is sent back to the client along with the task (which can be plain text).
The client works until they are finished. Ticks are recorded locally in JS.
The client finishes and sends the server back its answer, the number of ticks it recored, and the encrypted timestamp the server first sent it.
The server decrypts the timestamp, and compares it with the current local time to get a number of ticks.
If the server's computed number of ticks is within some tolerance (say, 10 seconds, to be safe), the server accepts the user's reported time. Otherwise, it knows the time was forged.
Because the user's time is accepted (so long as it is within reason), the user never knows that the server time could be out of sync with their reported time. Since the time periods you're tracking are long, loosing a few seconds of accuracy doesn't seem like it will be an issue. The method requires only the encryption of a single timestamp, so it should be fast.

The only way to prevent cheating is not to trust the client at all, but simply to calculate the final time on the server as the time taken from before sending the task to the client to after receiving the result.
This also implies that the final time has to include some network transmission delays, as unfair as that might seem: if you try to compensate for them somehow, the client can always pretend to be suffering from more delays than it actually is.
What you can do, however, is try to ensure that the network delays won't come as a surprise to the user. Below is a simple approach which completely prevents cheating while ensuring, given some mild assumptions about clock rates and network delays, that the running time shown on the client side when the results are submitted should approximately match the final time calculated on the server:
Client starts timer and requests task from server.
Server records current time and sends task to client.
User completes task.
Client sends result to server and (optionally) stops timer.
Server accepts result and subtracts timestamp saved in step 2 from current time to get final time.
Server sends final time back to client.
The trick here is that the client clock is started before the task is requested from the server. Assuming that the one-way network transmission delay between steps 1 and 2 and steps 4 and 5 is approximately the same (and that the client and server clocks run at approximately the same rate, even if they're not in sync), the time from step 1 to 4 calculated at the client should match the time calculated on the server from step 2 to 5.
From a psychological viewpoint, it might even be a good idea to keep the client clock running past step 4 until the final time is received from the server. That way, when the running clock is replaced by the final time, the jump is likely to be backwards, making the user happier than if the time had jumped even slightly forwards.

The best way to prevent the client from faking the timestamp is simply to never let them have access to it. Use a timestamp generated by your server when the user starts. You could store this in the server's RAM, but it would probably be better to write this into the database. Then when the client completes the task, it lets the server know, which then writes the end timestamp into the database as well.
It seems like the important information you're needing here is the difference in start and end times, not the actual start and end times. And if those times are important, then you should definitely be using the a single device's time tracking mechanism, the server's time. Relying upon the client's time prevents them from being comparable to each other due to differences in time zones. Additionally, it's too easy for the end user to fudge their time (accidentally or intentionally).
Bottom Line: There is going to be some inaccuracy here. You must compromise when you need to satisfy so many requirements. Hopefully this solution will give you the best results.

Clock synchronization
This what you are looking for, WikiPedia explanation.
And here is the solution for JavaScript.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight