Flink broadcast state implement session window inside process function - apache-flink

My flink app designed to process IoT data from sensors.
Sensors send data through gateways. this is what the sample data looks like
case class Data(sensorId: String, value: Float, gatewayId: String, timestamp: Long)
Data from the same sensor can come from different gateways
If the gateway is disconnected from the network, then I receive a special event about this case class GatewayEvents(gatewayId: String, event: String, timestamp: Long) and use the broadcast stream which is connected to the main data stream from the sensors
the sensor may not send data in two cases,
it is broken
the gateway is disconnected from the network (will receive GatewayEvents("gwId","disconnected",1617979694) message in broadcast stream)
If I received a message that some gateway was disconnected from the network and the sensors that sent data through it stopped sending data (for example, within 1 minute), I need to create a special event
my semi-implemented implementation looks like this:
case class Data(sensorId: String, value: Float, gatewayId: String)
case class GatewayEvents(gatewayId: String, event: String, timestamp: Long)
val sensorData: DataStream[Data] ...
val gwData: DataStream[GatewayEvents] ...
val gatewayBroadcastStateDescriptor = new MapStateDescriptor[String, GatewayEvents]("gatewayEvents", classOf[String], classOf[GatewayEvents])
val broadcastGatewayEventsStream = gwData.broadcast(gatewayBroadcastStateDescriptor)
val events: sensorData.
.keyBy(_.sensorId)
.connect(broadcastGatewayEventsStream)
.process(...)
Can't make the implementation of this process. Any ideas? I think the SessionWindows will help me, but I can't figure out how best to do it

So, the simplest idea would be to use timers in this case I think. So, basically You could implement KeyedCoProcess function in a way that if it receives GatewayDisconnected message You will register timer (processing time) to fire after desired time. If any message arrives for sensor You would simply delete the registered timer, so that it won't fire. Inside ofonTimer function You can simply emit the desired event since if the timer fires it means that no value has arrived in the timespan.
One thing to note here is that if You keyBy(_.sensorId) it means the event would be generated for every sensor that was received through this gateway. If You want to emit only one event for the gatewa, You can simply change partitioning to keyBy(_.gatewayId).

Related

What is the meaning of 'sender=:1.478' in dbus-monitor?

Nowadays I am analyzing d-bus in Chromium OS (Chrome OS).
I captured meaningful d-bus method calls (below), when I press ''guest' button on login UI.
my-cros # dbus-monitor --system "path=/org/chromium/Session Manager"
method call time=1632311881.319994 sender=: 1.478 -> destination=org.chromium. SessionManager serial=378 path=/org/chromium/Session Manager; interface=org.chromium.SessionManager Interface: member=LoadShil1Profile
string "$guest"
method call time=1632311881.319417 sender:1.478 -> destination=org. chromium. Session Manager serial=371 path=/org/chromium/SessionManager; interface=org.chromium.SessionManager Interface: member-SetFeatureFlagsFor User string "$guest"
array [
]
array [
]
I know that org.chromium.SessionManager is the one who starts guest/google-id session.
Btw what is the meaning of 'sender=:1.478'?
And how to track the sender process?
Thank you in advance.
Firstly, you might find it easier to visualise what’s going on by using Bustle instead of dbus-monitor.
sender=:1.478 means the message you’re looking at was sent by the connection with unique ID :1.478 on the bus. Each connection to the bus (roughly, each process, although a process can actually have more than one connection) has a unique ID, and some connections also have ‘well-known’ IDs which look like reverse-DNS names. For example org.chromium.SessionManager.
You can track the sender process by looking for the same unique ID appearing as the sender or destination of other messages. Using Bustle will make this easier, as it can group and filter messages by sender/destination.

Packet forwarding event in Contiki

I am doing some work on worm-attack detection in RPL. In RPL, the communication between the clients might be multiple hops, with the packets going through many nodes.
However, only the receiver gets a tcpip_event on reception of the packet. The nodes that the route passes through do not get this event. Is there any way to detect the packet on the intermediate nodes?
You cannot get a notification or callback when a packet is forwarded. However, you can get a callback when a packet is received or sent by the lower layers.
In Contiki, use the function rime_sniffer_add for that. Check apps/powertrace/powertrace.c for an example.
In Contiki-NG the function has been renamed to netstack_sniffer_add.
Usage example:
Declare the sniffer like this, in the global scope:
RIME_SNIFFER(packet_sniffer, input_packet, output_packet);
Then add the sniffer from your code, once, at the start of the application execution:
rime_sniffer_add(&packet_sniffer);
The functions input_packet and output_packets are callbacks defined by you and can be used to examine the packets; for example, like this:
static void
input_packet(void)
{
int rssi = (int)packetbuf_attr(PACKETBUF_ATTR_RSSI);
printf("received a packet with RSSI=%d\n", rssi);
}

In flink how to verify if same user data is not received in the given window?

I have a IOT device emitting data to kafka topic , data like firstname,lastname,emailId,event_time etc .
I have to Verify no other event received for the same user in the defined Window of operation for the stream processing.
For example if i am getting user X details 3 times within a window of 5 min , I should process(add to sink) only first data received from the user X and next two records to be discarded.
The most obvious solution will be to key the events by user data and reduce them leaving only the first one.
Something like this:
dataStream
.keyBy(event -> event.emailId()) // Key by emailId
.reduce(new ReduceFunction<Event>() {
#Override
public Integer reduce(Event value1, Event value2)
throws Exception {
return event1; // always leave only first event
}
});

Check if all I'm receiving stream properly with all keys

I have the following scenario: suppose there are 20 sensors which are sending me streaming feed. I apply a keyBy (sensorID) against the stream and perform some operations such as average etc. This is implemented, and running well (using Flink Java API).
Initially it's all going well and all the sensors are sending me feed. After a certain time, it may happen that a couple of sensors start misbehaving and I start getting irregular feed from them e.g. I receive feed from 18 sensors,but 2 don't send me feed for long durations.
We can assume that I already know the fixed list of sensorId's (possibly hard-coded / or in a database). How do I identify which two are not sending feed? Where can I get the list of keyId's to compare with the list in database?
I want to raise an alarm if I don't get a feed (e.g 2 mins, 5 mins, 10 mins etc. with increasing priority).
Has anyone implemented such a scenario using flink-streaming / patterns? Any suggestions please.
You could technically use the ProcessFunction and timers.
You could simply register timer for each record and reset it if You receive data. If You schedule the timer to run after 5 mins processing time, this would basically mean that If You haven't received the data it would call function onTimer, from which You could simply emit some alert. It would be possible to re-register the timers for already fired alerts to allow emitting alerts with higher severity.
Note that this will only work assuming that initially, all sensors are working correctly. Specifically, it will only emit alerts for keys that have been seen at least once. But from your description it seems that It would solve Your problem.
I just happen to have an example of this pattern lying around. It'll need some adjustment to fit your use case, but should get you started.
public class TimeoutFunction extends KeyedProcessFunction<String, Event, String> {
private ValueState<Long> lastModifiedState;
static final int TIMEOUT = 2 * 60 * 1000; // 2 minutes
#Override
public void open(Configuration parameters) throws Exception {
// register our state with the state backend
state = getRuntimeContext().getState(new ValueStateDescriptor<>("myState", Long.class));
}
#Override
public void processElement(Event event, Context ctx, Collector<String> out) throws Exception {
// update our state and timer
Long current = lastModifiedState.value();
if (current != null) {
ctx.timerService().deleteEventTimeTimer(current + TIMEOUT);
}
current = max(current, event.timestamp());
lastModifiedState.update(current);
ctx.timerService().registerEventTimeTimer(current + TIMEOUT);
}
#Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out) throws Exception {
// emit alert
String deviceId = ctx.getCurrentKey();
out.collect(deviceId);
}
}
This assumes a main program that does something like this:
DataStream<String> result = stream
.assignTimestampsAndWatermarks(new MyBoundedOutOfOrdernessAssigner(...))
.keyBy(e -> e.deviceId)
.process(new TimeoutFunction());
As #Dominik said, this only emits alerts for keys that have been seen at least once. You could fix that by introducing a secondary source of events that creates an artificial event for every source that should exist, and union that stream with the primary source.
The pattern is very clear to me now. I've implemented the solution and it works like charm.
If anyone needs the code, then I'll be happy to share

How to send/receive binary data over TCP using NodeMCU?

I've been trying to mount a custom protocol over the TCP module on the NodeMCU platform. However, the protocol I try to embed inside the TCP data segment is binary, not ASCII-based(like HTTP for example), so sometimes it contains a NULL char (byte 0x00) ending the C string inside the TCP module implementation, causing that part of the message inside the packet get lost.
-- server listens on 80, if data received, print data to console and send "hello world" back to caller
-- 30s time out for a inactive client
sv = net.createServer(net.TCP, 30)
function receiver(sck, data)
print(data)
sck:close()
end
if sv then
sv:listen(80, function(conn)
conn:on("receive", receiver)
conn:send("hello world")
end)
end
*This is a simple example which, as you can see, the 'receiver' variable is a callback function which prints the data from the TCP segment retrieved by the listener.
How can this be fixed? is there a way to circumvent this using the NodeMCU library? Or do I have to implement another TCP module or modify the current one's implementation to support arrays or tables as a return value instead of using strings?
Any suggestion is appreciated.
The data you receive in the callback should not be truncated. You can check this for yourself by altering the code as follows:
function receiver(sck, data)
print("Len: " .. #data)
print(data)
sck:close()
end
You will observe, that, while the data is indeed only printed up to the first zero byte (by the print()-function), the whole data is present in the LUA-String data and you can process it properly with 8-bit-safe (and zerobyte-safe) methods.
While it should be easy to modify the print()-function to also be zerobyte-safe, I do not consider this as a bug, since the print function is meant for texts. If you want to write binary data to serial, use uart.write(), i.e.
uart.write(0, data)

Resources