Using a custom DataAbortHandler in WindowsCE - arm

We're developing an application based on Windows CE. At the moment we're fighting with numerous data abort exceptions that only occur with release builds. We only have a limited number of development devices, that actually output their debug stream onto the serial port. Now we're wondering if it is possible to use OemDataAbortHandler to access the content of the Exception (i.e. everything that is written to the debug stream) in order to gather the data for diagnostic purposes.
Ideally we'd be able to create a textfile containing data like this:
Exception 'Data Abort' (4): Thread-Id=05a70002(pth=8252169c),
Proc-Id=03cf000e(pprc=824f3d70) 'XXXX.exe', VM-active=03cf000e(pprc=824f3d70) 'XXXX.exe'
PC=400323cc(coredll.dll+0x000223cc) RA=4003361c(coredll.dll+0x0002361c) SP=0102f27c,
BVA=6464646c
Now, the signature of OemDataAbortHandler is:
void OEMDataAbortHandler(void);
Is there any way to get access to the data written to the debug stream?

You should be able to use Structured Exception Handling (__try/__except) to filter the data abort exception. The processor state at the time of the exception is returned in the CONTEXT argument of the GetExceptionInformation intrinsic function. See the documentation for try/except.

Related

How to get specific values (eg. battery2, servo outputs) available in Mission Planner through Dronekit?

I am currently using dronekit-python to implement somewhat of a Mission Planner clone, as an API. I've generally been able to replicate most important features from Mission Planner; however, some features don't seem to be present.
One such feature is reading live servo outputs, which can be done in Setup > Mandatory Hardware > Servo Output (image below). I have been able to emulate getting/setting the output's function, min, trim, max, and reversed values through parameters. However, I cannot seem to access the live position values through dronekit. How would you go about this?
A second feature is reading specific values from the plane, beyond the class attributes present. This is available in Mission Planner when double-clicking a value in the Quick pane in order to change what measurement is displayed (image below). For my use case, I'd like to specifically access battery_voltage2 and battery_remaining2, as these are vital measurements for our system. I tried using vehicle.battery in dronekit, but this seems to only display data from battery 1. Any ideas?
Thank You so much for the help!
It might be possible to get the battery information and other information from the drone by using Mavlink messages. For battery information, look at the BATTERY_STATUS (#147) Mavlink message. For servo information, look at the SERVO_OUTPUT_RAW (#36) message.
In order to receive these messages, look into using message listeners from dronekit-python. You should be able to receive and parse the Mavlink message.
In general, you can use message listeners and the dronekit-python message factory to receive and send Mavlink messages, which allows you more control than some of the built-in dronekit functions. If you decide to control the drone this way, though, be careful because it's pretty easy to mess up your logic and have the drone behave unexpectedly.
Hope this helps!

Handling poison messages in Apache Flink

I am trying to figure out the best practices to deal with poison messages / unhandled exceptions with Apache Flink. We have a Job doing real time event processing of location data from IoT devices. There are two potential scenarios where this can arise:
Data is bad in some way - e.g. invalid value
Data triggers a bug due to some edge case we have not anticipated.
Currently, all my data processing stops because of just one message.
I've seen two suggestions:
Catch the exceptions - this requires me wrapping every piece of logic with something to catch every runtime exception
Use side outputs as a kind of DLQ - from what I can tell this seems to be a variation on #1 where I have to catch all the exceptions and send them to the side output.
Is there really no way to do this other than wrap every piece of logic with exception handling? Is there no generic way to catch exceptions and not have processing continue?
I think the idea is not to catch all kinds of exceptions and send them elsewhere, but rather to have well-tested and functioning code and use dead letters only for invalid inputs.
So a typical pipeline would be
source => validate => ... => sink
\=> dead letter queue
As soon as your record passes your validate operator, you want all errors to bubble up, as any error in these operators may result in corrupted aggregates and data that - once written - cannot be reverted easily.
The validate step would work with any of the two approaches that you outlined. Typically, side-outputs have better semantics, but you may end up with more code.
Now you may have a service with high SLAs and actually want it to produce output even if it is corrupted just to produce data. Or you have simple transformation pipeline, where you'd miss some events but keep the majority (and downstream can deal with incomplete data). Then you are right that you need to wrap the code of all operators with try-catch. However, you'd typically still would only do it for the fragile operators and not for all of them. Trivial operators should be tested and then trusted to work. Further, you'd usually only catch specific kinds of exceptions to limit the scope to the kind of expected exceptions that can happen.
You might wonder why Flink doesn't have it incorporated as a default pattern. There are two reasons as far as I can see:
If Flink silently ignores any kind of exception and sends an extra message to a secondary sink, how can Flink ensure that the throwing operator is in a sane state afterwards? How can it avoid any kind of leaks that may happen because cleanup code is not executed?
It's more common in Java to let the developers explicitly reason about exceptions and exception handling. It's also not straight-forward to see what the requirements are: Do you want to have the input only? Do you also want to store the exception? What about the operator state that may have influenced the outcome? Should Flink still fail when too many errors have been received in a given time window? It quickly becomes a huge feature for something that should not happen at all in an ideal world where high quality data is ingested and properly processed.
So while it looks easy for your case because you exactly know which kinds of information you want to store, it's not easy to have a solution for all purposes, especially since the extra code that a user has to write is tiny compared to the generic solution.
What you could do is to extract most of the complicated logic things into a single ProcessFunction and use side-outputs as you have outlined. Since it's a central piece, you'd only need to write the side-output function once. If it's done multiple times, you could extract a helper function where you pass your actual code as a RunnableWithException lambda which hides all the side-output logic. Make sure you use plenty of finally blocks to ensure a sane state.
I'd also add quite a few IT cases and use mutation testing to harden your pipeline quicker. If you keep your test data inline, the mutants may also exactly simulate your unexpected data issues, such that your validate operator gets more complete.

Using Broadcast State To Force Window Closure Using Fake Messages

Description:
Currently I am working on using Flink with an IOT setup. Essentially, devices are sending data such as (device_id, device_type, event_timestamp, etc) and I don't have any control over when the messages get sent. I then key the steam by device_id and device_type to preform aggregations. I would like to use event-time given that is ensures the timers which are set trigger in a deterministic nature given a failure. However, given that this isn't always a high throughput stream a window could be opened for a 10 minute aggregation period, but not have its next point come until approximately 40 minutes later. Although the calculation would aggregation would eventually be completed it would output my desired result extremely late.
So my work around for this is to create an additional external source that does nothing other than pump fake messages. By having these fake messages being pumped out in alignment with my 10 minute aggregation period, even if a device hadn't sent any data, the event time windows would have something to force the windows closed. The critical part here is to make it possible that all parallel instances / operators have access to this fake message because I need to close all the windows with this single fake message. I was thinking that Broadcast state might be the most appropriate way to accomplish this goal given: "Broadcast state is replicated across all parallel instances of a function, and might typically be used where you have two streams, a regular data stream alongside a control stream that serves rules, patterns, or other configuration messages." Quote Source
Questions:
Is broadcast state the best method for ensuring all parallel instances (e.g. windows) receive my fake messages?
Once the operators have access to this fake message via the broadcast state can this fake message then be used to advance the event time watermark?
You can make this work with broadcast state, along the lines you propose, but I'm not convinced it's the best solution.
In an ideal world I'd suggest you arrange for the devices to send occasional keepalive messages, but assuming that's not possible, I think a custom Trigger would work well here. You can extend the EventTimeTrigger so that in addition to the event time timer it creates via
ctx.registerEventTimeTimer(window.maxTimestamp());
you also create a processing time timer, as a fallback, and you FIRE the window if the window still exists when that processing time timer fires.
I'm recommending this approach because it's simpler and more directly addresses the specific need. With the broadcast state approach you'll have to introduce a source for these messages, add a broadcast state descriptor and stream, add special fake watermarks for the non-broadcast stream (set to Watermark.MAX_WATERMARK), connect the broadcast and non-broadcast streams and implement a BroadcastProcessFunction (that probably doesn't really do anything), etc. It's a lot of moving parts spread across several different operators.

AsyncIO Exceptions in Apache Flink

In Apache Flink, I'm using the RichAsyncFunction for data enrichment. In the case of errors/exceptions, I want to funnel those error records into an error stream. I can see that other functions have a "side output" for this sort of scenario, but how is it handled in RichAsyncFunction? I also see use of ResultFuture<>.completeExceptionally, but what does this do or mean when it occurs? Does the stream stop, is it just logged, what is the state with regards to the output element of the stream? All the docs seem to just point out how to handle the happy path or to call completeExceptionally with no explanation of what happens next. What is the proper way to handle/capture errors in RichAsyncFunction?
Thanks!

Handling "state refresh" in Flink ConnectedStream

We're building an application which has two streams:
A high-volume messages stream
A large static stream (originating from some parquet files we have lying around) which we feed into Flink just to get that Dataset into a saved state
We want to connect the two streams in order to get shared state, so that the 1st stream can use the 2nd state for enrichment.
Every day or so, the parquet files (2nd streams's source) are updated, and that will require us to clear the state of the 2nd stream and rebuild it (will probably take about 2 minutes).
The question is, can we block/delay messages from the 1st stream while this process is running?
Thanks.
There's currently no direct/easy way to block one stream on another stream, unfortunately. The typical solution is to buffer the ingest stream while you load (or re-load) the enrichment stream.
One approach you could try is to wrap your ingest stream in a custom SourceFunction that knows when to not generate data, based on some external trigger (which is the same signal you'd use to know that you have Parquet data to re-load).
Sounds a bit like your case is similar to Flip-23, which explores Model Serving in Apache Flink.
I think it all boils down to how (and if) your static stream is keyed:
if it is keyed in a similar way as your fast data, then you can key both streams, connect them and then have access to the keyed context.
if the static stream events are not keyed in a similar fashion maybe you should consider emitting control events which will trigger a refresh of those static files from an external source (eg s3). That's easier said than done as there is no trivial way to guarantee that all parallel instances of your fast stream will get the control event.
You can use ListState as a buffer, how you can access this though depends on the shape of your data.
It might help, if you shared a bit more info about the shape of your data (eg are you joining on a key? are you simply serving a model? other?).

Resources