Flushing data while snapshotState() in CheckpointedFunction - apache-flink

I have a code that flushes some data to external source in
#Override
public void snapshotState(FunctionSnapshotContext context) {
flush();// to external source
}
Sometimes this external source is unavailable, and my Flink app stalls completely. I see that only 1 snapshotState can run in parallel, but does it halt the whole app from processing data?
Tried to look in the Flink documentation for state management, https://nightlies.apache.org/flink/flink-docs-release-1.11/learn-flink/fault_tolerance.html , found nothing :(

Related

Flink streaming: how to control the execution time

Spark streaming provides API for termination awaitTermination(). Is there any similar API available to gracefully shut down flink streaming after some t seconds?
Your driver program (i.e. the main method) in Flink doesn't stay running while the streaming job executes. Your program should define a dataflow, call execute, and then terminate. In Spark, the driver program stays running (AFAIK), and awaitTermination relates to that.
Note that a Flink streaming dataflow continues to execute indefinitely, unless you're using a 'bounded' data source with a finite number of elements. You may also cancel or stop a job, and even take a checkpoint upon stopping to be resumed from later.

Data logging application, server component - separate thread or separate process?

I am writing a data logging application which reads some values from an external device and saves them to a file periodically. Also, I would like for the application to have a server component that would make current readings accessible over TCP/IP.
The application is (being) written in C in a unix-like environment.
I am not sure whether the server should run as a separate process (fork itself away after start) and use some IPC to obtain the data or whether it would be better off as a separate thread only?
What ingredients go into such a decision?
Thanks!
If you are after real-time, stay away from "another" process as this just introduces another hop in the data path, which slows transmission down.
Have one process, instantiating a reader thread, pulling data from the device and pushing it into an internal buffer, probably implementing double-buffering, depending on the device's capabilities.
Then have a logger thread and a sender thread reading from this internal buffer.

Loading and saving application configuration on the GUI thread

A common approach to saving and loading of user / application settings in a GUI application looks like this:
private void FormMain_Load(object sender, EventArgs e)
{
m_foo = Properties.Settings.Default.Foo;
}
private void FormMain_FormClosing(object sender, FormClosingEventArgs e)
{
Properties.Settings.Default.Save();
}
At first glance, this looks reasonable. However, once you consider what actually transpires inside the save and (implicit) load operations, namely disk I/O, it appears a bit suspect. Doesn't it stand against the principle of avoiding potentially long running operations in general and especially I/O on the GUI thread?
I realize that in the vast majority of cases we're talking about very small files located on a local hard drive, but I can come up with scenarios where the operation would take some time to complete if I really wanted to (disk was put to sleep, disk is under stress, disk is actually network storage, etc).
Also, it's not clear what I should do about it. Startup is easy enough to handle, disabling the GUI while the settings are loading asynchronously. But what about the close event? Sure, I could cancel the event, save asynchronously, and then close it myself but it gets a little messy as I need to take care of cases where the user tries to close it again before I finished saving (so I don't initiate another save, or exit while in the middle of saving etc). And I'm not sure it can even be done (at least simply) in other frameworks, say GTK# where the OnDeleteEvent is typically used when it's too late. I suppose I could fire a foreground thread and save it there, but then the user might think the application is closed and run it again before the settings were actually saved (and there could be other consequences for having multiple instances of it alive regardless).
Should I be worried about such scenarios or am I overthinking it?
You are over-thinking it. Presumably based on the assumption that saving settings is slow. It is not in a Winforms app, you are not running this app on a mobile device. File writes are always cached by the file system cache, nothing but a memory-to-memory copy. They run at memory bus speeds, 5 gigbytes/sec at its slowest (old DDR2), more typically around 35 GB/sec.
After which the file system driver lazily writes the changes to disk, highly optimized to minimize the write head movement. Writes up to ~1 gigabyte can fit on modern machines, much more if it has enough RAM. An easy 6 orders of magnitude more than you'll ever need for a user.config file.
Any optimization that turns it into an async write cannot ever make the responsiveness of your app better than a millisecond, completely unobservable by a human. Also the reason why this isn't supported by System.Configuration. You don't have a real problem.
I think you're forgetting the "why" of the general rule (which is the danger with all general rules). The "why" behind "the principle of avoiding potentially long running operations in general and especially I/O on the GUI thread" is to maintain responsiveness (low latency) in the user interaction. If you do long-running work in the GUI thread then your GUI won't refresh as quickly.
What user interaction is needed as the application is loading (before it's ready to run), or as it is closing (when it can no longer be interacted with)? There are scenarios when there is a legitimate answer (interacting with another window in this app, etc.), but this is a minority of the time.
Also, there are various buffers that can help mitigate any latency here. The file is likely read into a disk buffer before your app calls the read of the config (not necessarily). Disk output need only go to the nearest buffer before the app can close.

Apache Camel Producer Consumer terminology dillemas

The following is the definition about a producer and a consumer given in Camel in Action book.
The consumer could be receiving the message from an external service, polling
for the message on some system, or even creating the message itself. This message
then flows through a processing component, which could be an enterprise integration
pattern (EIP), a processor, an interceptor, or some other custom creation. The message
is finally sent to a target endpoint that’s in the role of a producer. A route may
have many processing components that modify the message or send it to another location,
or it may have none, in which case it would be a simple pipeline.
My doubts:
What is an External Service?
How consumer comes into play before producer produces the message.My understanding is that A producer produces and transforms a message in exchange so that the message is compatible to consumer's endpoint.
Why does a consumer has to do a producer's work (that is transforming a message and sending it to producer again?) Shouldn't it be the viceversa?
Thanks!
An external service could be, for example, an external web service, an external REST service, an EJB, and so on.
A Consumer could be consuming from any of those services, or it could be listening for a file (or files) to be created in a specific place on the file system, it could be consuming from a message queue (JMS), etc, etc - there are endless possibilities limited only by the components and endpoints available.
Basically, with apache camel, you are designing a message bus (ESB), right? You can think like this - the "consumer" takes stuff from the outside world and puts it on the bus.
Then, your message will go through various routes (most probably being translated and modified along the way, via EIPs) and then eventually it has to go some place else "out there" in the real world - that's when the producer does it's job.
Consumer consumes on to the bus / Producer produces off of the bus.
Usually, you don't need to think too much about whether an endpoint is operating as a producer as a consumer - just use .from and .to as you need and everything should work fine from there.
Also have a read of this answer: Apache Camel producers and consumers
I hope this helps!

Azure StorageClient Transient Connection testing - hanging

I am testing my WPF application connecting to Azure Blob Storage to download a bunch of images using TPL (tasks).
It is expected that in Live environment, there will be highly transient connection to the internet at deployed locations.
I have set Retry Policy and time-out in BlobRequestOptions as below:
//Note the values here are for test purposes only
//CloudRetryPolicy is a custom method returning adequate Retry Policy
// i.e. retry 3 times, wait 2 seconds between retries
blobClient.RetryPolicy = CloudRetryPolicy(3, new TimeSpan(0, 0, 2));
BlobRequestOptions bro = new BlobRequestOptions() { Timeout = TimeSpan.FromSeconds(20) };
blob.DownloadToFile(LocalPath, bro);
The above statements are in a background task that work as expected and I have appropriate exception handling in background task and the continuation task.
In order to test exception handling and my recovery code, I am simulating internet disconnection by pulling out the network cable. I have hooked up a method to System.Net.NetworkChange.NetworkAvailabilityChanged event on UI thread and I can detect connection/disconnection as expected and update UI accordingly.
My problem is: If I pull the network cable while a file is being downloaded (via blob.DownloadToFile), the background thread just hangs. It does not timeout, does not crash, does not throw exception, nothing!!! As I write, I have been waiting ~30 mins and no response/processing has happened in relation to background task.
If I pull the network cable, before download starts, execution is as expected. i.e. I can see retries happening, exceptions raised and passed ahead and so on.
Has anyone experienced similar behaviour? Any tips/suggestions to overcome this behaviour/problem?
By the way, I am aware that I can cancel the download task on detection of network connectivity loss, but I do not want to do this as network connectivity can get restored within the time-out duration and the download process can continue from where it was interrupted. I have tested this auto resumption and works nicely.
Below is a rough indication of my code structure (not syntactically correct, just a flow indication)
btnClick()
{
declare background_task
attach continuewith_task to background task
start background task
}
background_task()
{
try
{
... connection setup ...
blob.DownloadToFile(LocalPath, bro);
}
catch(exception ex)
{
... exception handling ....
// in case of connectivity loss while download is in progress
// this block is not getting executed
// debugger just sits idle without a current statement
}
}
continuewith_task()
{
check if antecedent task is faulted
{
... do recovery work ...
// this is working as expected if connectivity is lost
// before download starts
// this task does not get called if connectivity is lost
// while file transfer is taking place
}
else
{
.. further processing ...
}
}
Avkash is correct I believe. Also, to be clear, you will basically never see that network removed error so not a lot of point in testing for it. You will see a ton of connection rejected, conflicts, missing resources, read-only accounts, throttles, access denied, even DNS resolution failures depending on how you are handling storage accounts. You should test for those.
That being said, I would suggest you do not use the RetryPolicy at all with blob or table storage. For most of the errors you will actually encounter, they are not retryable to begin with (e.g. 404, 409, 403, etc.). When you have a retry policy in place, it will by default actually try it 4 more times over the next 2 minutes. There is no point in retrying bad credentials for instance.
You are far better off to simply handle the error and retry selectively yourself (timeouts and throttle are about the only thing that make sense here).
Your problem is mainly caused because Azure storage client libraries uses file streaming classes underneath and that why the API hang is not directly related with Windows Azure Blob client library. Calling file streaming API directly over network you can see the exact same behavior when network cable is suddenly removed, however removing network gracefully will return different behavior.
If you search on internet you will find streaming classes does not detect the network loss and that's why in your code you can check the network disconnect event and then stop the background streaming thread.

Resources