How to get IoT Analytics to create a new row for each element in the input JSON array? - aws-iot

If I am receiving a data record at a time as a JSON array from an IoT device in my channel. The received message looks like this :
{
"state":{
"reported":{
"temperature": 24.28,
"humidity": 37.67,
"pressure": 1019.57,
"proximity": 1485
}
}
}
the datastore is:
{
reported = {
temperature = 24.28,
humidity = 37.67,
pressure = 1019.57,
proximity = 1485
}
}
My desired result is :
temperature humidity pressure proximity
Value1 Value2 Value3 Value4
AnotherValue1 AnotherValue2 AnotherValue3 AnotherValue4
How can I get IoT Analytics to create a new row in the datastore for each element within the received JSON array?

In order to have separate columns for the temperature, humidity, pressure, and proximity attributes in your datastore you have the following options:
Modify the message directly using AddAttributes and RemoveAttributes activities in your pipeline.
Add temperature, humidity, pressure, and proximity to the top level of the json via an AddAttributes activity.
Remove state from the top level of the json via a RemoveAttributes activity.
Modify the message directly using an AWS Lambda activity to transform the json so that temperature, humidity, pressure, and proximity are top level keys of the json and state and reported are no longer keys.
If instead you want to maintain the current structure of the message in your datastore, but would like your query results to be in the format you specified, you can accomplish this via the following sample query.
SELECT state.reported.temperature, state.reported.humidity, state.reported.pressure, state.reported.proximity FROM datastore WHERE state IS NOT NULL AND state.reported IS NOT NULL
For more information about this query see SQL expressions in IoT Analytics.

Related

Caching refresh and update strategy inputs

I am building a Redis cache to store product data for eg
Key - value pairs as
key -> testKey
value [json] ->
{
"testA" : "A",
"testB" : "B",
"testC" : "C"
}
Problem i am struggling with is if i get two requests to update this value for key.
request1 to change -> "testB" = "Bx"
request2 to change -> "testC" = "Cx"
How to handle inconsistancy.
As based on my understanding one request will read above data and update only testB value and another request will update testC value because these are running in parallel and any new request is not waiting for last update in cache to propagate.
How do we maintain data consistancy with Redis ?.
I can think of locking using transaction DB in front but that will reduce latency of real time data.
It based on what data structure you selected in Redis.
In your case Hash will be a good way to store all fields in your values. And use HSET command to update target fields, which can guarantee your update requests will only update a single field. And all Redis commands will be execute senquentially, so you will not have concurrency issues.
Also you can use String to store raw json data, and serialize/deserialize for each query and update. In this case you will need to consider concurrency because your read and update will not be atomic operation.(maybe a distribute lock can be the solution).

FLINK ,trigger event based on JSON dynamic input data ( like map object data)

I would like to know if FLINK can support my requirement, I have gone through with lot of articles but not sure if my case can be solved or not
Case:
I have two input source. a)Event b)ControlSet
Event sample data is:
event 1-
{
"id" :100
"data" : {
"name" : "abc"
}
}
event 2-
{
"id" :500
"data" : {
"date" : "2020-07-10";
"name" : "event2"
}
}
if you see event-1 and event-2 both have different attribute in "data". so consider like data is free form field and name of the attribute can be same/different.
ControlSet will give us instruction to execute the trigger. for example trigger condition could be like
(id = 100 && name = abc) OR (id =500 && date ="2020-07-10")
please help me if these kind of scenario possible to run in flink and what could be the best way. I dont think patternCEP or SQL can help here and not sure if event dataStream can be as JSON object and can be query like JSON path on this.
Yes, this can be done with Flink. And CEP and SQL don't help, since they require that the pattern is known at compile time.
For the event stream, I propose to key this stream by the id, and to store the attribute/value data in keyed MapState, which is a kind of keyed state that Flink knows how to manage, checkpoint, restore, and rescale as necessary. This gives us a distributed map, mapping ids to hash maps holding the data for each id.
For the control stream, let me first describe a solution for a simplified version where the control queries are of the form
(id == key) && (attr == value)
We can simply key this stream by the id in the query (i.e., key), and connect this stream to the event stream. We'll use a RichCoProcessFunction to hold the MapState described above, and as these queries arrive, we can look to see what data we have for key, and check if map[attr] == value.
To handle more complex queries, like the one in the question
(id1 == key1 && attr1 == value1) OR (id2 == key2 && attr2 == value2)
we can do something more complex.
Here we will need to assign a unique id to each control query.
One approach would be to broadcast these queries to a KeyedBroadcastProcessFunction that once again is holding the MapState described above. In the processBroadcastElement method, each instance can use applyToKeyedState to check on the validity of the components of the query for which that instance is storing the keyed state (the attr/value pairs derived from the data field in the even stream). For each keyed component of the query where an instance can supply the requested info, it emits a result downstream.
Then after the KeyedBroadcastProcessFunction we key the stream by the control query id, and use a KeyedProcessFunction to assemble together all of the responses from the various instances of the KeyedBroadcastProcessFunction, and to determine the final result of the control/query message.
It's not really necessary to use broadcast here, but I found this scheme a little more straightforward to explain. But you could instead route keyed copies of the query to only the instances of the RichCoProcessFunction holding MapState for the keys used in the control query, and then do the same sort of assembly of the final result afterwards.
That may have been hard to follow. What I've proposed involves composing two techniques I've coded up before in examples: https://github.com/alpinegizmo/flink-training-exercises/blob/master/src/main/java/com/ververica/flinktraining/solutions/datastream_java/broadcast/TaxiQuerySolution.java is an example that uses broadcast to trigger the evaluation of query predicates across keyed state, and https://gist.github.com/alpinegizmo/5d5f24397a6db7d8fabc1b12a15eeca6 is an example that uses a unique id to re-assemble a single response after doing multiple enrichments in parallel.

Best practice to store and update Time-series data in an mongodb

I am using mongodb to store the data of a sensor. Sensor pushes the data via mqtt, and nodejs (loopback) is being used to persist data in mongodb.
following properties are required to be saved in a document.
{
"time": "2019-01-01T14:22:55.691Z",
"value1": 0,
"value2":50 (but value2 will be received after few minutes of value1)
}
Story: I am using ultrasonic sensor to check the water level in tank. when I turn on the water pump I save the value of water level as value1 and when I turn off the water pump I want to save it as value2, as there will be many documents so how I can update a document for value2? do I query the latest "time" property or any other better way?

How to get the min and max value from a dbcontext

I am storing sensor data in a sql database. Each entry has a device_id, timestamp, value and some other properties.
I want to implement a webapi function which returns the first and last event that i have received from a sensor device. So my serialized DTO should be something like this:
{ "start": <timestamp:long>, "end": <timestamp:long>, "deviceid": <someid:string>}
I am using Entityframework Core to access my database. I can get a list of unique device id's without any problems. If i try to get the min or max value of the timestamp per device, my dbcontext throws an invalid operation exception.
I tried several things like this:
var deviceList=_context.Data.Select(d=>d.DeviceId).Distinct();
foreach(string deviceId in deviceList)
{
var max = _context.Data.Where(g=>g.DeviceId==deviceId).Max(c=>c.Timestamp);
var min ...
}
This snippet throws an invalid operation exception from the clr. As far as i understand this, it creates a statement which is not executed on the server side. My question is how i can create a query that requests the min and max value (per device) on a dbcontext? Also i would like know what the recommended way is to implement this, so that it is not executed on the client side.
Thank you Werner.
I am expecting your _context.Data refers to all the records (rows) in table. and you can use group by to get it. try below one, and let me know if it works for you.
First we group by the device id, and then for each group, we get the min and max timespan.
var data = _context.Data.GroupBy(f => f.deviceId)
.Select(g =>
new {
deviceId = g.Key,
start = g.Min(r=>r.timestamp),
end = g.Max(r=>r.timestamp)
}
);

Get database last N data points from each node (Cloudant/couchdb)

TL;DR: MapReduce or POST request?
What is the correct(=most efficient) way to fetch the latest n data points of multiple sensors, from Cloudant or equivalent database?
Sensor data is stored in individual documents like this:
{
"_id": "2d26dbd8e655ae02bdab611afc92b6cf",
"_rev": "1-a64448521f05935b915e4bee12328e84",
"date": "2017-06-20T15:59:50.509Z",
"name": "Sensor01",
"temperature": 24.5,
"humidity": 45.3,
"rssi": -33
}
I want the fetch the latest 10 documents from sensor01-sensor99 so I can feed it to UI.
I have discovered few options:
1. Use map reduce function
Reduce each sensor data to array under sensor01, sensor02, etc...
E.g.
Map:
function (doc) {
if (doc.name && doc.temperature) emit(doc.name, doc.temperature);
}
Reduce:
function (keys, values, rereduce) {
var temp_arr=[];
for (i=0;i<values.length;i++)
{
temp_arr.push(values);
}
return temp_arr;
}
I couldn't get this to work, but I think the method should be viable.
2. Multi-document fetching
{
"queries":[
{sensor01},{sensor02},{sensor03} etc....
]};
Where each {sensor0x} is filtered using
{"startkey": [sensors[i],{}],"endkey": [sensors[i]],"limit": 5}
This way I can order documents using ?descending=true
I implemented it and it works. I have my doubts should I use this if I have 1000 sensors with 10000 data points each.
And for hundreds of sensors I need to send a very large POST request.
Something better?
Is my architecture even correct?
Storing sensor data individual documents, and then fill the UI by fetching all data through REST API.
Thank you very much!
There's nothing wrong with your method of storing one reading per document, but there's no truly efficient way of getting "the last n data points" for a number of sensors.
We could create a MapReduce function:
function (doc) {
if (doc.name && doc.temperature && doc.date) {
emit([doc.name, doc.date], doc.temperature);
}
}
This creates an indexed ordered on name and date.
We can access the most recent readings for a single sensor by querying the view:
/_design/mydesigndoc/_view/myview?startkey=["Sensor01","2020-01-01"]&descending=true&limit=10
This fetches readings for "Sensor01" in newest-first order:
startkey & endkey are reveresed when doing descending=true
descending= true means in reverese order
limit - the number of readings required (or n in your parlance)
This is a very efficient use of Cloudant/CouchDB but it only returns the last n readings for single sensor. To retrieve other sensors' data, additional API calls would be required.
Creating an index like this:
function (doc) {
if (doc.name && doc.temperature && doc.date) {
emit(doc.date, doc.temperature);
}
}
orders each reading by date. You can then retrieve the newest n readings with:
/_design/mydesigndoc/_view/myview?startkey="2020-01-01"&descending=true&limit=200
If all of your sensors are saving data at the same rate, then simply using a larger limit should get your the latest readings of all sensors.
This too is an efficient use of CouchDB/Cloudant.
You may also want to look at the built-in reducers (_count, _sum and _stats) to get the database to aggregate readings for you. They are a great way to create year/month/day groupings of IoT data.
In general, I would recommend not using custom reducers they are many times more inefficient than the built-in reducers which are written in Erlang.

Resources