Should I using async code or sync code in apache flink

Should I using async code or sync code in apache flink - apache-flink

When my application interacts with IO (database, third API,...), I'm using Async as a recommendation of Flink: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html.
But my application usually interacts with DB, should I always use async?
I have many questions:
If I using async (completablefuture), then my application is not blocked as when using sync code => In terms of performance, async code is better sync code?
What is the performance if I using sync code and increase parallel?
When I using async code, After a while, my application throw Exception "Mailbox is in state CLOSED, but is required to be in state OPEN for put operations". Look like it's related to open many threads?

Using asynchronous i/o is better for these reasons:
Better resource utilization. If you make synchronous requests then one task will be handling just one request at a time. With asynchronous requests, a single task can be handling dozens of in-flight requests.
While your user code is blocked waiting for a response to a synchronous request, that operator cannot participate in checkpointing. In the best case this makes checkpointing slow, and it can lead to checkpoint timeouts and job failure.
Yes, you can make synchronous i/o work by increasing the parallelism. But that's throwing resources at a problem that has a better solution.
As for the Mailbox problem, I believe this can only occur if the job is shutting down. I think this is a side effect of some other problem that has caused the job to fail. Maybe look around in the logs for other indications of what's going on.

Related

Concurrency & Parallelism in AppEngine

I am learning app-engine and have created a spring based application which has a controller for accepting all in-coming requests. There is just one method in the controller which will be used to populated 5 tables in BigQuery. So, I have 5 separate methods to insert data in BigQuery. I am calling each of these methods one at a time sequentially in my controller method. But, I want to execute these 5 BQ methods in parallel not in sequence. How can I achieve such a parallelism in App-Engine app.

There are a two different strategies you can use on GAE - concurrency and deferred approaches. Both have a few flavours.
Concurrency
There are two basic flavours of this, relying on async APIs or creating background threads.
Most of the GAE platform APIs are asynchronous (or can be) and you can invoke multiple of them at once then block until they've all resolved. In this case, you could make 5 asynchronous calls to BigQuery using the UrlFetchService.
GAE also allows the creation of background threads for the duration of a request. All threads must complete before the result is returned to the client. This is generally the least idiomatic approach for GAE.
Deferred processing
GAE offers two flavours of task queue, push and pull.
Push queues are basically a queued task being executed by a specified URL at a rate you control. They can participate in transactions and have retry rules etc. they can be used to ensure a workload is executed but independently of the initiating request. This is the most idiomatic solution for the general problem of 'background work' on GAE
Pull queues are queues that wait for an initiating request to slurp some data out for processing, usually in bulk. They're triggered by cron jobs typically.
In your case, your best bet is to use async http requests, unless you're using an SDK/API wrapper that doesn't expose this. If not, look to task queues. Almost any app you build will end up using them anyway, and they're very graceful and simple to comprehend.

Queuing requests when network connection is slow or fails

I have an AngularJS/Cordova application for iOS which makes calls to a remote API.
Sometimes, our users have slow or low-quality connection on their mobile phones, and therefore cannot perform a certain action on the phone.
During this time, the user can tap on other buttons which fires off more network requests, and they become queued up and causes the application to hang.
What would be the best way to help remedy this situation? I was thinking of the following options:
Requests that time out after a certain n seconds will simply be aborted.
Use debounce to wait n msecs before firing off a request, and cancel the timer if the user does something else (this still wouldn't account for failed requests that are made on the slow network).
Add failed/timed out requests to a queue, and send them later when there is a more reliable connection (not sure how to accomplish this).
Does anyone know of any other solutions -- and any ideas on implementation?
Any advice highly appreciated. Thanks!

I think the simple safest way to prevent to user to invoke additional request - to use loader that blocks page till get any response/error. Its not demands complicated logic and therefore maintenance.

Use a POST method in a MSSQL trigger

I need to POST (HTTP method) some info to an external URL when a trigger executes.
I know there are a lot of security and performance implications when using triggers, so I am afraid this is not the place to do this kind of processing. But anyway I am posting this to get some feedback or ideas on how to approach the problem. Some considerations :
The transaction fired in the trigger could be asynchronous.
The process must take care of the authorization
The end URL is a php script on the internet.
What really triggers this execution should be an insert or an update of one record to a table, so I must use this trigger since I can't touch the (third party) application.
On a side note, could the Service Broker be something to consider ?
Any ideas will be welcome.

You are right, this is not something you want to do in a trigger. The last thing you want in your application is to introduce the latency of a HTTP request in every update/insert/delete, which will be very visible even when things work well. But when things work bad, it will work very bad: the added coupling will cause your application to fail when the HTTP resource has availability problems, and even worse is the correctness issues related to rollbacks (your transaction that executed the trigger may rollback, but the HTTP call is already made).
This is why is paramount to introduce a layer that decouples the trigger from the HTTP call, and this is done via a queue. Whether is a table used as a queue, or a Service Broker queue, or even an MSMQ queue is up to you to make the call. The simplest solution is to use a table as a queue:
the trigger enqueues (inserts) a request for the HTTP call to be made
after the transaction that run the trigger commits, the request is available to dequeue
an external application that monitors (polls) the queue picks up the request and places the HTTP call
The advantage of Service Broker over custom tables-as-queues is Internal Activation, which would allow your HTTP handling code to run on-demand when there are items to be processed in the queue, instead of polling. But making the HTTP call from inside the engine, via SQLCLR, is something quite ill advised. An external process is much better for accessing something like HTTP and therefore the added complexity of Service Broker is not warranted.

Why do major DB vendors not provide truly asynchronous APIs?

I work with Oracle and Mysql, and I struggle to understand why the APIs are not written such that I can issue a call, go away and do something else, and then come back and pick it up later eg NIO - I am forced to dedicate a thread to waiting for data. It seems that the SQL interfaces are the only place where sync IO is still forced, which means tying up a thread waiting for the DB.
Can anybody explain the reasons for this? Is there something fundamental that makes this difficult?
It would be great to be able to use 1-2 threads to manage my DB query issue and result fetch, rather than use worker threads to retrieve data.
I do note that there are two experimental attempts (eg: adbcj) at implementing an async API but none seem to be ready for Production use.

Database servers should be able to handle thousands of clients. To provide an asyncronous interface, the DB server will need to keep the resultset from the query in memory, so you can pick it up at later stage. It will quickly become out of resources.

A considerable problem with async is many many libraries use threadlocal for transactions.
For example in Java Much of the JDBC specification relies on a synchronous behavior to achieve single thread per-transaction. That is you write your transaction in procedural order.
To do it right transactions would have to be done through callback but they are not. I know of only node.js that does this but its unclear if its really async.
Of course even if you do async I'm not sure if it will really improve performance as the database itself if is probably doing it synchronous.
There are lots of ways to avoid thread over-population in (Java):
Is asynchronous jdbc call possible?
Personally to get around this issue I use a Message Bus like RabbitMQ.

Silverlight Threading and its usage

Scenario : I am working on LOB application, as in silverlight every call to service is Async so automatically UI is not blocked when the request is processed at server side.
Silverlight also supports threading as per my understanding if you are developing LOB application threads are most useful when you need to do some IO operation but as i am not using OOB application it is not possible to access client resource and for all server request it is by default Async.
In above scenario is there any usage of Threading or can anyone provide some good example where by using threading we can improve performance.
I have tried to search a lot on this topic but everywhere i have identified some simple threading example from which it is very difficult to understand the real benefit.
Thanks for help

Tomasz Janczuk has also pointed out that if the UI thread is fairly busy, you can significantly improve the performance even of async WCF calls by marshaling them onto a separate thread. And I should note that the UI thread can get awfully busy doing things that you wouldn't anticipate would chew up cycles, like calculating drop-shadows and what-not, so this might be worth investigating (and measuring) for your application.
That said, I've been writing LOB apps for the better part of two decades, and synchronous IO aside, I haven't found a lot of scenarios where adding multiple threads in an LOB application was worth the additional complexity.
Edit 4/2/10: I had lunch with Tomasz Janczuk and some other folks from the WCF team the other day, and they clarified a few issues for me about how WCF works with Silverlight background threads. There are two things to be concerned with: sending data, and receiving it (say, from duplex callbacks or async call completions). When you send data, the call will always be made from the thread that actually makes the call. So if you have a lot of data that needs to be serialized, you might get a small performance boost by marshaling the outgoing call onto a background thread (say, by using ThreadPool.QueueUserWorkItem). But it's not likely to be a substantial performance boost.
However, when you receive data, either through a duplex callback, or through an async xxxCompleted method, the data is always received on the thread on which the connection was originally opened. This means that if you're opening the connection explicitly, it will receive data on that thread; but if you're opening the connection implicitly, it will receive data on the thread on which you made your first outbound connection. This won't make a lot of difference if you need to update the UI on every callback, since you'd just have to marshal the call back onto the UI thread. But if there are times when you just need to store the data for future reference or processing, you can get yourself a significant performance boost by opening your connection on a separate thread, so that you can receive and process callbacks without waiting on the UI thread.
Hope this helps. Thought I'd write it down while I still have it reasonably fresh in my head.

The same advantages apply to Silverlight as to other applications. If your are doing a long running calculation on the client and don't want to tie up the main/ui thread, then threading is an obvious choice.
Also, I haven't researched it, but I would imagine if you are running a multi-core machine, you could improve performance by splitting work into multiple separate threads.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight