I'm trying to store IoT Data from data loggers that can have a variety of sensors attached, below is an example. Each logger sends an MQTT message every 20 seconds
"state": {
"reported": {
"batv": 5105,
"ts": 1614595073655,
"temp": 20,
"humidity": 50
}
}
My Question is in terms of storing these MQTT messages/readings efficiently in a DynamoDB table, should i store the readings in a Map containing Maps like this. (Note this is currently what I'm doing and when the number of readings gets large, it is very slow to load in AWS DynamoDB console.)
{
"readings": {
"ts1614592810955": {
"battery_level": 5089,
"temp": 20,
"humidity": 50
},
"ts1614593692395": {
"battery_level": 5093,
"temp": 20,
"humidity": 50
}
},
"serial_number": "TDG_logger_thing"
}
The alternative which I'm leaning towards, is by storing readings in a list
{
"readings": [
{
"batv": 5105,
"ts": 1614594313407,
"temp": 20,
"humidity": 50
},
{
"batv": 5105,
"ts": 1614594313555,
"temp": 20,
"humidity": 50
}
],
"serial_number": "TDG_Logger_Thing"
}
Anyone with knowledge on DynamoDB or storing IoT data have any suggestions? greatly appreciated
(BTW The flow of data is)
Data Logger -> AWS IoT -> AWS Lambda -> DynamoDB
DDB List operations can be a limiting factor when you have use cases like trying to reliably modify attributes held in the List
Example - List
In a List, to set temp to 30 where ts = 1614594313407, you would need to fetch the List from DDB, search / traverse each object until ts = 1614594313407, set temp to 30, then write the whole List back to DDB. Not quite transactional
[
{
"batv": 5105,
"ts": 1614594313407,
"temp": 20,
"humidity": 50
},
{
"batv": 5105,
"ts": 1614594313555,
"temp": 20,
"humidity": 50
}
]
Example - Map
With a Map, you can update the value of temp to 30 where ts = ts1614592810955 in a single update "SET readings.#ts_id.temp = :temp_val" reliably
{
"readings": {
"ts1614592810955": {
"battery_level": 5089,
"temp": 20,
"humidity": 50
},
"ts1614593692395": {
"battery_level": 5093,
"temp": 20,
"humidity": 50
}
},
"serial_number": "TDG_logger_thing"
}
I would not use a map or a list and split those readings and store them in separate items. With the same partition key like the device id, combined with a sort key for every reading, also including the timestamp. That way you can more easily query for all temp data and with the timestamp in the sort key you could use the query to fetch only the measurements from a specific period.
so primary key would be:
PK[device id] - SK[Measurement type - Data time] : (Attributes per measurement)
After that you can store whatever data you need for each individual measurement. and you can quickly update and retrieve individual measurements, hope it helps.
Related
I have many json resources similar to the below one. But, I need to only fetch the json resource which satisfies the two conditions:
(1) component.code.text == Diastolic Blood Pressure
(2) valueQuantity.value < 90
This is the JSON object/resource
{
"fullUrl": "urn:uuid:edf9439b-0173-b4ab-6545 3b100165832e",
"resource": {
"resourceType": "Observation",
"id": "edf9439b-0173-b4ab-6545-3b100165832e",
"component": [ {
"code": {
"coding": [ {
"system": "http://loinc.org",
"code": "8462-4",
"display": "Diastolic Blood Pressure"
} ],
"text": "Diastolic Blood Pressure"
},
"valueQuantity": {
"value": 81,
"unit": "mm[Hg]",
"system": "http://unitsofmeasure.org",
"code": "mm[Hg]"
}
}, {
"code": {
"coding": [ {
"system": "http://loinc.org",
"code": "8480-6",
"display": "Systolic Blood Pressure"
} ],
"text": "Systolic Blood Pressure"
},
"valueQuantity": {
"value": 120,
"unit": "mm[Hg]",
"system": "http://unitsofmeasure.org",
"code": "mm[Hg]"
}
} ]
},
}
JSON file
I need to write a condition to fetch the resource with text: "Diastolic Blood Pressure" AND valueQuantity.value > 90
I have written the following code:
def self.hypertension_observation(bundle)
entries = bundle.entry.select {|entry| entry.resource.is_a?(FHIR::Observation)}
observations = entries.map {|entry| entry.resource}
hypertension_observation_statuses = ((observations.select {|observation| observation&.component&.at(0)&.code&.text.to_s == 'Diastolic Blood Pressure'}) && (observations.select {|observation| observation&.component&.at(0)&.valueQuantity&.value.to_i >= 90}))
end
I am getting the output without any error. But, the second condition is not being satisfied in the output. The output contains even values < 90.
Please anyone help in correcting this ruby code regarding fetching only, output which contains value<90
I will write out what I would do for a problem like this, based on the (edited) version of your json data. I'm inferring that the full json file is some list of records with medical data, and that we want to fetch only the records for which the individual's diastolic blood pressure reading is < 90.
If you want to do this in Ruby I recommend using the JSON parser which comes with your ruby distro. What this does is it takes some (hopefully valid) json data and returns a Ruby array of hashes, each with nested arrays and hashes. In my solution I saved the json you posted to a file and so I would do something like this:
require 'json'
require 'pp'
json_data = File.read("medical_info.json")
parsed_data = JSON.parse(json_data)
fetched_data = []
parsed_data.map do |record|
diastolic_text = record["resource"]["component"][0]["code"]["text"]
diastolic_value_quantity = record["resource"]["component"][0]["valueQuantity"]["value"]
if diastolic_value_quantity < 90
fetched_data << record
end
end
pp fetched_data
This will print a new array of hashes which contains only the results with the desired values for diastolic pressure. The 'pp' gem is for 'Pretty Print' which isn't perfect but makes the hierarchy a little easier to read.
I find that when faced with deeply nested JSON data that I want to parse in Ruby, I will save the JSON data to a file, as I did here, and then in the directory where the file is, I run IRB so I can just play with accessing the hash values and array elements that I'm looking for.
I am working with Cosmos DB and I want to write a SQL query that returns multiple document in one single embed documents.
To elaborate, imagine you have the following two document types in one container. OrderId of Order document has reference in OrderDetail document.
1.Order
{
"OrderId": "31d4c08b-ee59-4ede-b801-3cacaea38808",
"Name": "ABC",
"Type": "Order",
"DeptName": "ABC",
"TotalAmount": 100.05
}
2.OrderDetail
{
"OrderDetailId": "689bdc38-9849-4a11-b856-53f8628b76c9",
"OrderId": "31d4c08b-ee59-4ede-b801-3cacaea38808",
"Type": "OrderDetail",
"ItemNo": 202,
"Quantity": 10,
"UnitPrice": 10.05
},
{
"OrderDetailId": "789bdc38-9849-4a11-b856-53f8628b76c9",
"OrderId": "31d4c08b-ee59-4ede-b801-3cacaea38808",
"Type": "OrderDetail",
"ItemNo": 200,
"Quantity": 11,
"UnitPrice": 15.05
}
I want to write a query that will return all entries of OrderDetail in one array based on reference OrderId="31d4c08b-ee59-4ede-b801-3cacaea38808"
Output should be like below
{
"OrderId":"31d4c08b-ee59-4ede-b801-3cacaea38808",
"Name":"ABC",
"Type":"Order",
"OrderDetail":[
{
"OrderDetailId":"689bdc38-9849-4a11-b856-53f8628b76c9",
"Type":"OrderDetail",
"ItemNo":202,
"Quantity":10,
"UnitPrice":10.05
},
{
"OrderDetailId":"789bdc38-9849-4a11-b856-53f8628b76c9",
"Type":"OrderDetail",
"ItemNo":200,
"Quantity":11,
"UnitPrice":15.05
}
]
}
I have no idea how to query in Cosmosdb to get the above result.
Your desired output should be applied in relational database,Cosmos db is non-relational db which is not appropriate for your scenario. Per my knowledge, no query sql could produce above output directly.
I suggest you executing 2 sqls, one produces:
{"OrderId":"31d4c08b-ee59-4ede-b801-3cacaea38808",
"Name":"ABC",
"Type":"Order"}
other one produces:
"OrderDetail":[
{
"OrderDetailId":"689bdc38-9849-4a11-b856-53f8628b76c9",
"Type":"OrderDetail",
"ItemNo":202,
"Quantity":10,
"UnitPrice":10.05
},
{
"OrderDetailId":"789bdc38-9849-4a11-b856-53f8628b76c9",
"Type":"OrderDetail",
"ItemNo":200,
"Quantity":11,
"UnitPrice":15.05
}
]
Then combine them. Surely,you could do such process in Stored Procedure.
I want to query the array field from elasticsearch. I have an array field that contains one or several node numbers of a gpu that were allocated to a job. Different people may be using the same node at the same time given that some people may be sharing the same gpu node with others. I want get the total number of distinct nodes that were used at a specific time.
Say I have three rows of data which fall in the same time interval. I want to plot a histogram showing that there are three nodes occupied in that period. Can I achieve this on Kibana?
Example :
[3]
[3,4,5]
[4,5]
I am expecting an output of 3 since there were only 3 distinct nodes used.
Thanks in advance
You can accomplish this using a combination of a date histogram aggregation along with either a terms aggregation (if the exact number of nodes is important) or a cardinality aggregation (if you can accept some inaccuracy at higher cardinalities).
Full example:
# Start with a clean slate
DELETE test-index
# Create the index
PUT test-index
{
"mappings": {
"event": {
"properties": {
"nodes": {
"type": "integer"
},
"timestamp": {
"type": "date"
}
}
}
}
}
# Index a few events (using the rows from your question)
POST test-index/event/_bulk
{"index":{}}
{"timestamp": "2018-06-10T00:00:00Z", "nodes":[3]}
{"index":{}}
{"timestamp": "2018-06-10T00:01:00Z", "nodes":[3,4,5]}
{"index":{}}
{"timestamp": "2018-06-10T00:02:00Z", "nodes":[4,5]}
# STRATEGY 1: Cardinality aggregation (scalable, but potentially inaccurate)
POST test-index/event/_search
{
"size": 0,
"aggs": {
"active_nodes_histo": {
"date_histogram": {
"field": "timestamp",
"interval": "hour"
},
"aggs": {
"active_nodes": {
"cardinality": {
"field": "nodes"
}
}
}
}
}
}
# STRATEGY 2: Terms aggregation (exact, but potentially much more expensive)
POST test-index/event/_search
{
"size": 0,
"aggs": {
"active_nodes_histo": {
"date_histogram": {
"field": "timestamp",
"interval": "hour"
},
"aggs": {
"active_nodes": {
"terms": {
"field": "nodes",
"size": 10
}
}
}
}
}
}
Notes:
Terms vs. cardinality aggregation: Use the cardinality agg unless you need to know WHICH nodes are in use. It is significantly more scalable, and until you get into cardinality of 1000s, you likely won't see any inaccuracy.
Date histogram interval: You can play with the interval such that it's something that makes sense for you. If you run through the example above, you'll only see one histogram bucket, however if you change hour to minute, you'll see the histogram build itself out with more data points.
My webapp needs to display several sorted lists of document attributes in a graph. These are hours, cycles, and age.
I have an AQL query that beautifully traverses the graph and gets me all the data my app needs in 2 ms. I'm very impressed! But I need it sorted for each graph. The query currently returns an array of json objects that contain all three of the attributes and the id for which they apply. Awesome. The query also very easily sorts on one of the attributes.
My problem is: I need to have a sorted list of all three, and would prefer not to query the database three times since the data is all in the same documents my traversal returned.
I would like to return three sorted arrays of json objects: one containing hours and the id, one containing cycles and the id, and one containing age and the id. This way, my graphs can easily display all three graphs without client-side sorting.
HTTP requests themselves are time consuming although the database is very fast, which is why I'd like to pull all three at once, as the data itself is small.
My current query is a simple graph traversal:
for v, e, p in outbound startNode graph 'myGraph'
filters & definitions...
sort v.hours desc
return {"hours": v.hours, "cycles": v.cycles, "age": v.age, "id": v.id}
Is there an easy way I can tell Arango to return me this structure?
{
[
{
"id": 47,
"hours": 123
},
{
"id": 23,
"hours": 105
}...
],
[
{
"id": 47,
"cycles": 18
},
{
"id": 23,
"cycles": 5
}...
],
[
{
"id": 47,
"age": 4.2
},
{
"id": 23,
"age": 0.9
}
]
}
Although the traversal is fast, I would prefer if I didn't have to re-traverse the graph three times to do it, if possible.
My solution:
let data = (for v, e, p in outbound startNode graph 'myGraph'
filters & definitions...
return {"hours": v.hours, "cycles": v.cycles, "age": v.age, "id": v.id})
let byHours = (for thing in data
sort thing.hours desc
return {"hours": thing.hours, "id": thing.id})
let byCycles = (for thing in data
sort thing.cycles desc
return {"cycles": thing.cycles, "id": thing.id})
let byAge = (for thing in data
sort thing.age desc
return {"age": thing.age, "id": thing.id})
return {"hours": byHours, "cycles": byCycles, "age": byAge}
I'm not sure how this compares against your solution performance-wise, but the most obvious solution would be to traverse once and then create three sorted results like this:
LET nodes = (
FOR v, e, p IN OUTBOUND startNode GRAPH 'myGraph'
FILTER ...
RETURN v
)
RETURN {
hours: (
FOR n IN nodes
SORT n.hours DESC
RETURN KEEP(n, ['hours', 'id'])
),
cycles: (
FOR n IN nodes
SORT n.cycles DESC
RETURN KEEP(n, ['cycles', 'id'])
),
age: (
FOR n IN nodes
SORT n.age DESC
RETURN KEEP(n, ['age', 'id'])
)
}
This would traverse the graph only once but sort the result three times.
In Case 4 of this page, the query searches for all chairs less than 70 units in height:
curl localhost:9200/example/product/_search -d '{
"query": {
"filtered": {
"query": {
"match": {
"name": "chair"
}
},
"filter": {
"numeric_range": {
"size.height": {
"lt": 70
}
}
}
}
}
}'
Result:
"hits": [
{
"_id": "0",
"_source": {
"product": "chair",
"size": [
{
"width": 50,
"height": 50,
"depth": 50
},
{
"width": 75,
"height": 75,
"depth": 75
}
]
}
}
]
1) why is the ID 0 for both chair sizes?
2) why does the response show dimensions for the other chair that is 75 units in height?
1) The writer wanted to show 1 to N relation. Meaning there are 2 (In this case) types of chairs in his repository: A chair with dimensions of 50 and a chair with dimensions of 75. But both of them are still chairs and the id of a chair is 0.
2) Because by default ES doesn't return partial results, it returns documents. In our case we have a chair document with a size array which holds 2 objects: One for the 50 dimension and one for the 75 dimension. The supplied query can either select the whole document or not.
If you want to convert the query to English you may say: Bring me all the documents which have the value "chair" in the product field and at least one of its size.height values is lower than 70.
Even though the writer of the article is knowledgeable, I must say I don't like this kind of articles that trying to draw a direct flow between the SQL world to NOSQL implementation. If it was so easy, some big company would have write an automatic script that converts your SQL schemas to various NOSQL formats. In order to model your data correctly in NOSQL you must understand your products, understand the factors that should influence on your decision, understand the use case and the data. There is no one universal solution that will tell you: If you did it this way in a RDBMS do it like this in ES.