I am doing a POC to implement a system like this:
User: has credit
A server will call a http request on user's behalf, when the response status returns 200 -> mark it as completed and subtract user's credit
The HTTP request could be a very critical operation (like placing an order on an exchange)
Very simple system but things are getting tricky in the real world when everything is unreliable
Approach:
Opening a transaction and locking user's row first -> subtract the credit -> send the HTTP request
If the response returns an error, send it again, which is fine here because the 3rd party server never processed it
If the 3rd-party server returns 200, commit the transaction. However, there is a latency between receiving the response status and committing a transaction. Imagine the latency is 50 ns, if the systems crash within this time, my server will never know the HTTP request was successful. Hence it will retry (meaning the 2nd request)
Am I facing Two Generals' Problem here? I have been thinking a lot and there is no way I can guarantee it
I am curious how can banking systems in the world can deal with this. How can they guarantee the client can never lose a penny?
Related
I understand that Azure standard Stateful Logic app workflow runs Asynchronously but can i use stateful standard logic app for the below scenario:
We want to receive Json data from the third party in a HTTP post request, then process it and store it in Azure data lake. But the problem is since Azure standard stateful workflow runs asynchronously as soon the http trigger is hit it returns Status 202 Accepted. I want to send the caller end status of the request. For example- I want to send 500 Internal server error when the request was valid but still the workflow failed due to an internal error. If the data was processed successfully i want to send the caller HTTP Status 200 Ok. I dont want to send always HTTP status 202 Accepted to the caller. I want the caller to know what exactly happened to their HTTP request. Is it possible through standard logic app? I dont want to use consumption Logic app because of security reasons.
You can achieve this using runafter configuration by enabling this configuration it runs even after the whole workflow is getting failed.
Go to your work flow and select Menu for the action you want to run regardless if the previous one is about to fail, timeout, or skip. It's Condition in my case, and then 'Configure run after'.
For instance here is my logic app
Here is how my code view looks like :
OUTPUT:
UPDATED ANSWER
In that case, too you can use the same runafter concept with the condition having status code is not equal to 200 as a true statement and continue the flow
Here is the logic app
Here is the output
I have a Flink application that consumes incoming messages on a Kafka topic with multiple partitions, does some processing then sends them to a sink that sends them over HTTP to an external service. Sometimes the downstream service is down the stream processing needs to stop until it is back in action.
There are two approaches I am considering.
Throw an exception when the Http sink fails to send the output message. This will cause the task and job to restart according to the configured restart strategy. Eventually the downstream service will be back and the system will continue where it left off.
Have the Sink sleep and retry on failure; it can do this continually until the downstream service is back.
From what I understand and from my PoC, with 1. I will lose exactly-least once guarantees since the sink itself is external state. As far as I can see, you cannot make a simple HTTP endpoint transactional, as it needs to be to implement TwoPhaseCommitSinkFunction.
With 2. this is less of an issue since pipeline will not proceed until the sink makes a successful write, and I can rely on back pressure throughout the system to pause the retrieval of messages from the Kafka source.
The main questions I have are:
Is it a correct assumption that you can't make a TwoPhaseCommitSinkFunction for a simple HTTP endpoint?
Which of the two strategies, or neither, makes the most sense?
Am I missing simpler obvious solutions?
I think you can try AsyncIO in Flink - https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/.
Try to make the HTTP endpoint send a response once all operation has been done for the request, e.g. In http server, the process for the request has been done and the result has been committed to DB. Then use a http async client in AsyncIO operator. The AsyncIO operator will wait until the response is received by the operator. If any error happened, the Flink streaming pipeline will fail and restart the pipeline based on recovery strategy.
All requests to HTTP endpoint without receiving response will be in the internal buffer of AsyncIO operator, and once streaming pipeline failed, the requests pending in the buffer will be saved in the checkpoint state. It will also trigger back pressure when the internal buffer is full.
I was building a service that runs on Cloud Run that is triggered by PubSub through EventArc.
'PubSub' guarantees delivery at least one time and it would retry for every acknowledgement deadline. This deadline is set in the queue subscription details.
We could send an acknowledgement back at two points when a service receives a pub-sub request (which is received as a POST request in the service).
At the beginning of the request as soon as the request was received. The service would then continue to process the request at its own pace. However, this article points out that
When an application running on Cloud Run finishes handling a request, the container instance's access to CPU will be disabled or severely limited. Therefore, you should not start background threads or routines that run outside the scope of the request handlers.
So sending a response at the beginning may not be an option
After the request has been processed by the service. So this would mean that, depending on what the service would do, we cannot always predict how long it would take to process the request. Hence we cannot set the Acknowledgement deadline correctly, resulting in PubSub retries and duplicate requests.
So what is the best practice here? Is there a better way to handle this?
Best practice is generally to ack a message once the processing is complete. In addition to the Cloud Run limitation you linked, consider that if the endpoint acked a message immediately upon receipt and then an error occurred in processing it, your application could lose that message.
To minimize duplicates, you can set the ack deadline to an upper bound of the processing time. (If your endpoint ends up processing messages faster than this, the ack deadline won’t rate-limit incoming messages.) If the 600s deadline is not sufficient, you could consider writing the message to some persistent storage and then acking it. Then, a separate worker can asynchronously process the messages from persistent storage.
Since you are concerned that you might not be able to set the correct "Acknowledgement Deadline", you can use modify_ack_deadline() in your code where you can dynamically extend your deadline if the process is still running. You can refer to this document for sample code implementations.
Be wary that the maximum acknowledgement deadline is 600 seconds. Just make sure that your processing in cloud run does not exceed the said limit.
Acknowledgements do not apply to Cloud Run, because acks are for "pull subscriptions" where a process is continuously pulling the Cloud PubSub API.
To get events from PubSub into Cloud Run, you use "push subscriptions" where PubSub makes an HTTP request to Cloud Run, and waits for it to finish.
In this push scenario, PubSub already knows it made you a request (you received the event) so it does not need an acknowledgement about the receipt of the message. However, if your request sends a faulty response code (e.g. http 500) PubSub will make another request to retry (and this is configurable on the Push Subscription itself).
I'm developing a voting application with GAE, which involves sending email to each voter. In my initial tests, I went over the per-minute email quota, and this exception was raised:
OverQuotaError: The API call mail.Send() required more quota than is available.
I was able to solve this short term by enabling billing, which greatly increases the per minute email quota, but what is the right way to prevent such an exception from being raised in the future? If my app becomes wildly successful and I exceed the larger quota, it would be a big problem to have this exception raised.
I don't want to put the call to send emails in a try, except block, since this is being done after processing a form, and I don't want the user to wait around for the response to the POST.
Is this a good use case for a task queue? If so, would I put a request to send a batch of emails in the task queue or would each request to send an email go in the task queue? The former seems better in that processing the POST would be faster. Regardless of which way I do it, would I add a delay between sending each email to ensure they are not sent to fast and I go over quota?
yes, ideally suited to the task queue as you can limit the rate at which your emails are sent out by changing the properties in the queue.yaml
one email per task would be best, so if the task fails and is retried it will only retry the failed one not all of the batch
yes. use a task queue. if a task is sending a email, you can decide how many tasks should run per minute. and if a task failed it will retry to execute.
we wrote in C++ a screen sharing application based on sending screenshots.
It works by establishing a TCP connection btw the server and client, where the server forwards every new screenshot received for a user through the connection, and this is popped-up by the client.
Now, we are trying to host this on google app engine, and therefore need 'servlet'-ize and 'sandbox' the server code, so to implement this forwarding through HTTP requests.
I immagine the following:
1. Post request with the screenshot as multiple-data form (apache uploads ..).
But now the server needs to contact the specified client (who is logged in) to send it/forward the screenshot.
I'm not sure how to 'initiate' such connection from the servlet to the client. The client doesn't run any servlet environment (of course).
I know HTTP 1.1 mantains a TCP connection, but it seems gapps won't let me use it.
1 approaches that comes to mind is to send a CONTINUE 100 to every logged in user at login, and respond with the screenshot once it arrives. Upon receival the client makes another request, and so on.
an alternative (insipired from setting the refresh header for a browser) would be to have the app pool on a regular basis (every 5 secs).
You're not going to be able to do this effectively on GAE.
Problem 1: All output is buffered until your handler returns.
Problem 2: Quotas & Limits:
Some features impose limits unrelated
to quotas to protect the stability of
the system. For example, when an
application is called to serve a web
request, it must issue a response
within 30 seconds. If the application
takes too long, the process is
terminated and the server returns an
error code to the user. The request
timeout is dynamic, and may be
shortened if a request handler reaches
its timeout frequently to conserve
resources.
Comet support is on the product roadmap, but to me your app still seems like a poor fit for a GAE application.
Long Polling is the concept used for such asynchronous communications between server and client.
In Long Polling, servlet keeps a map of client and associated messages. Key of Map being client id and value being list of messages to be sent to the client. When a client opens a connection with server (sends request to a servlet), the servlet checks the Map if there are any messages to be sent to it. If found, it sends the messages to the client exits from the method. On receiving messages, the client opens a new connection to the server. If the servlet does not find any messages for given client, it waits till the Map gets updated with messages for given client.
This is a late reply, I'm aware, but I believe that Google have an answer for this requirement: the Channel API.